CharacterEncoderImplementations__ISO10646_to_UTF8.st
author Jan Vrany <jan.vrany@labware.com>
Wed, 22 Mar 2023 13:57:18 +0000
branchjv
changeset 25445 1623217d2268
parent 25406 eba3da836698
permissions -rw-r--r--
Cherry-picked OrderedCollection.st from 0b286fd51da7: * d4c86d7c0bfc: #TUNING by stefan, Stefan Vogel <sv@exept.de> * 692b6497a669: #DOCUMENTATION by stefan, Stefan Vogel <sv@exept.de> * d47bb2912953: #DOCUMENTATION by stefan, Stefan Vogel <sv@exept.de> * abb4316c6bff: #FEATURE by cg, Claus Gittinger <cg@exept.de> * 3a8fce0e8d11: #TUNING by stefan, Stefan Vogel <sv@exept.de> * 03d29bf8c5bb: #REFACTORING by stefan, Stefan Vogel <sv@exept.de> * cccc6c4abcfc: #REFACTORING by stefan, Stefan Vogel <sv@exept.de> * 35d957c7a840: #FEATURE by cg, Claus Gittinger <cg@exept.de> * 6b11890f5f2c: #OTHER by cg, Claus Gittinger <cg@exept.de> * abb6108fb06b: #FEATURE by cg, Claus Gittinger <cg@exept.de> * 2c4768bb2e89: #FEATURE by cg, Claus Gittinger <cg@exept.de> * 4029e964d0f1: #FEATURE by cg, Claus Gittinger <cg@exept.de> * ddcab3a9c2df: #OTHER by cg, Claus Gittinger <cg@exept.de> * 2213eb56e0c7: #REFACTORING by exept, Claus Gittinger <cg@exept.de> * 09ca874a6160: #REFACTORING by exept, Claus Gittinger <cg@exept.de> * 30b332af1f33: #BUGFIX by stefan, Stefan Vogel <sv@exept.de> * 779764ba117b: #REFACTORING by cg, Claus Gittinger <cg@exept.de> * b3d232a613c9: #BUGFIX by stefan, Stefan Vogel <sv@exept.de> * c417f7edaec1: #BUGFIX by stefan, Stefan Vogel <sv@exept.de> * 904b6538f379: #FEATURE by exept, Claus Gittinger <cg@exept.de> * c5887f03e01f: #REFACTORING by stefan, Stefan Vogel <sv@exept.de> * 8912d03aff48: #BUGFIX by exept, Claus Gittinger <cg@exept.de> * de5cd1dab4c3: #DOCUMENTATION by exept, Claus Gittinger <cg@exept.de> * 9bbd26603378: #OTHER by exept, Claus Gittinger <cg@exept.de> * c2c9dc110f42: #FEATURE by stefan, Stefan Vogel <sv@exept.de> * 81d123c6703d: #DOCUMENTATION by stefan, Stefan Vogel <sv@exept.de> * 8aadbb21458a: #BUGFIX by stefan, Stefan Vogel <sv@exept.de> * f210dbb8b2f6: #TUNING by stefan, Stefan Vogel <sv@exept.de> * c2c774fc53c0: #FEATURE by exept, Claus Gittinger <cg@exept.de> * b6f462670875: #DOCUMENTATION by exept, Claus Gittinger <cg@exept.de> * 27ae4021d5d6: #FEATURE by stefan, Stefan Vogel <sv@exept.de> * 10d9e9d85594: #TUNING by exept, Claus Gittinger <cg@exept.de> * 2653d855dcc7: #DOCUMENTATION by exept, Claus Gittinger <cg@exept.de> * 6ea1698a1a34: #FEATURE by stefan, Stefan Vogel <sv@exept.de> * 28762315e664: #OTHER by exept, Claus Gittinger <cg@exept.de> * 7142ea786f3e: #TUNING by stefan, Stefan Vogel <sv@exept.de> * 7875acb42b53: #BUGFIX by stefan, Stefan Vogel <sv@exept.de> * 163a0eebc97e: #BUGFIX by Maren, matilk
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8148
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     1
"
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     2
 COPYRIGHT (c) 2004 by eXept Software AG
23107
40173e082cbc Copyright updates
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 21387
diff changeset
     3
 COPYRIGHT (c) 2015 Jan Vrany
25406
eba3da836698 Fix 16 years old bug in `EncodedStream >> #next` for UTF8 to Unicode decoder
Jan Vrany <jan.vrany@labware.com>
parents: 23107
diff changeset
     4
 COPYRIGHT (c) 2021 LabWare
8411
44509c4f92f0 *** empty log message ***
ca
parents: 8406
diff changeset
     5
	      All Rights Reserved
8148
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     6
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     7
 This software is furnished under a license and may be used
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     8
 only in accordance with the terms of that license and with the
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     9
 inclusion of the above copyright notice.   This software may not
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    10
 be provided or otherwise made available to, or used by, any
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    11
 other person.  No title to or ownership of the software is
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    12
 hereby transferred.
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    13
"
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    14
"{ Package: 'stx:libbasic' }"
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    15
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    16
"{ NameSpace: CharacterEncoderImplementations }"
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    17
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    18
TwoByteEncoder subclass:#ISO10646_to_UTF8
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    19
	instanceVariableNames:''
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    20
	classVariableNames:''
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    21
	poolDictionaries:''
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    22
	category:'Collections-Text-Encodings'
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    23
!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    24
18604
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    25
ISO10646_to_UTF8 class instanceVariableNames:'theOneAndOnlyInstance'
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    26
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    27
"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    28
 No other class instance variables are inherited by this class.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    29
"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    30
!
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    31
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    32
!ISO10646_to_UTF8 class methodsFor:'documentation'!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    33
8148
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    34
copyright
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    35
"
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    36
 COPYRIGHT (c) 2004 by eXept Software AG
23107
40173e082cbc Copyright updates
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 21387
diff changeset
    37
 COPYRIGHT (c) 2015 Jan Vrany
25406
eba3da836698 Fix 16 years old bug in `EncodedStream >> #next` for UTF8 to Unicode decoder
Jan Vrany <jan.vrany@labware.com>
parents: 23107
diff changeset
    38
 COPYRIGHT (c) 2021 LabWare
8411
44509c4f92f0 *** empty log message ***
ca
parents: 8406
diff changeset
    39
	      All Rights Reserved
8148
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    40
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    41
 This software is furnished under a license and may be used
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    42
 only in accordance with the terms of that license and with the
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    43
 inclusion of the above copyright notice.   This software may not
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    44
 be provided or otherwise made available to, or used by, any
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    45
 other person.  No title to or ownership of the software is
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    46
 hereby transferred.
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    47
"
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    48
!
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    49
21298
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    50
documentation
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    51
"
21299
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
    52
    I can encode characters into/from UTF8
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
    53
    
21298
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    54
    Notice the naming (many are confused):
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    55
        Unicode is the set of number-to-glyph assignments
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    56
    whereas:
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    57
        UTF8 is a concrete way of xmitting Unicode codePoints (numbers).
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    58
    UTF16 is another concrete encoding, for example.    
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    59
        
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    60
    ST/X NEVER uses UTF8 internally - all characters are full 24bit characters.
21301
f33ff66e5fff #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 21299
diff changeset
    61
    Only when exchanging data, are these converted into UTF8 (or other) byte sequences.
21298
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    62
"
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    63
!
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    64
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    65
examples
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    66
"
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    67
  Encoding (unicode to utf8)
8411
44509c4f92f0 *** empty log message ***
ca
parents: 8406
diff changeset
    68
     ISO10646_to_UTF8 encodeString:'hello'.
8297
e7a05a86f280 removed iso8859-chars (for hpux)
Claus Gittinger <cg@exept.de>
parents: 8221
diff changeset
    69
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    70
8297
e7a05a86f280 removed iso8859-chars (for hpux)
Claus Gittinger <cg@exept.de>
parents: 8221
diff changeset
    71
  Decoding (utf8 to unicode):
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    72
     |t|
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    73
21387
HG Automerge
parents: 19863 21301
diff changeset
    74
     t := ISO10646_to_UTF8 encodeString:'Hello'.
8411
44509c4f92f0 *** empty log message ***
ca
parents: 8406
diff changeset
    75
     ISO10646_to_UTF8 decodeString:t.
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    76
"
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    77
! !
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    78
18604
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    79
!ISO10646_to_UTF8 class methodsFor:'instance creation'!
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    80
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    81
flushSingleton
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    82
    "flushes the cached singleton"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    83
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    84
    theOneAndOnlyInstance := nil
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    85
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    86
    "
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    87
     self flushSingleton
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    88
    "
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    89
!
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    90
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    91
new
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    92
    "returns a singleton"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    93
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    94
    theOneAndOnlyInstance isNil ifTrue:[
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    95
        theOneAndOnlyInstance := self basicNew initialize.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    96
    ].
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    97
    ^ theOneAndOnlyInstance.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    98
!
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    99
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   100
theOneAndOnlyInstance
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   101
    "returns a singleton"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   102
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   103
    theOneAndOnlyInstance isNil ifTrue:[
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   104
        theOneAndOnlyInstance := self basicNew initialize.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   105
    ].
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   106
    ^ theOneAndOnlyInstance.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   107
! !
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   108
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   109
!ISO10646_to_UTF8 methodsFor:'encoding & decoding'!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   110
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   111
decode:aCode
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   112
    self shouldNotImplement "/ no single byte conversion possible
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   113
!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   114
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   115
decodeString:aStringOrByteCollection
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   116
    "given a string in UTF8 encoding,
17489
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   117
     return a new string containing the same characters, in Unicode encoding.
17623
6fe31bc70e49 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 17489
diff changeset
   118
     Returns either a normal String, a Unicode16String or a Unicode32String instance.
17489
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   119
     This is only useful, when reading from external sources or communicating with
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   120
     other systems 
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   121
     (ST/X never uses utf8 internally, but always uses strings of fully decoded unicode characters).
19838
a6ca726d596c #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 18625
diff changeset
   122
     This only handles up-to 30bit characters."
14172
8c2cf2a68116 changed:
Stefan Vogel <sv@exept.de>
parents: 11996
diff changeset
   123
19838
a6ca726d596c #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 18625
diff changeset
   124
    ^ CharacterArray decodeFromUTF8:aStringOrByteCollection.
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   125
!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   126
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   127
encode:aCode
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   128
    self shouldNotImplement "/ no single byte conversion possible
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   129
!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   130
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   131
encodeString:aUnicodeString
17489
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   132
    "return the UTF-8 representation of a Unicode string.
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   133
     The resulting string is only useful to be stored on some external file,
19838
a6ca726d596c #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 18625
diff changeset
   134
     not for being used inside ST/X."
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   135
19838
a6ca726d596c #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 18625
diff changeset
   136
    ^ aUnicodeString utf8Encoded.
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   137
! !
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   138
18807
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   139
!ISO10646_to_UTF8 methodsFor:'queries'!
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   140
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   141
bytesToReadFor:firstByte 
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   142
    |bytesToRead|
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   143
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   144
    bytesToRead := 1.
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   145
    (firstByte isBitSet:8) ifFalse:[^1].
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   146
    7 downTo:3
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   147
        do:[:idx | 
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   148
            (firstByte isBitSet:idx) ifTrue:[
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   149
                bytesToRead := bytesToRead + 1
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   150
            ] ifFalse:[
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   151
                ^bytesToRead                
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   152
            ]
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   153
        ].
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   154
    ^bytesToRead
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   155
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   156
    "Created: / 14-06-2005 / 17:17:24 / janfrog"
18807
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   157
!
8163
a867b07aa226 name query
ca
parents: 8148
diff changeset
   158
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   159
characterSize:charOrcodePoint
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   160
    "return the number of bytes required to encode codePoint"
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   161
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   162
    "Taken from RFC 3629"
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   163
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   164
    (charOrcodePoint asInteger between:16r00000000 and:16r0000007F) ifTrue:[^1].
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   165
    (charOrcodePoint asInteger between:16r00000080 and:16r000007FF) ifTrue:[^2].
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   166
    (charOrcodePoint asInteger between:16r00000800 and:16r0000FFFF) ifTrue:[^3].
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   167
    (charOrcodePoint asInteger between:16r00010000 and:16r0010FFFF) ifTrue:[^4].
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   168
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   169
    ^self error:'Invalid codePoint'
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   170
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   171
    "Created: / 15-06-2005 / 15:16:22 / janfrog"
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   172
!
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   173
8163
a867b07aa226 name query
ca
parents: 8148
diff changeset
   174
nameOfEncoding
14172
8c2cf2a68116 changed:
Stefan Vogel <sv@exept.de>
parents: 11996
diff changeset
   175
    ^ #utf8
8163
a867b07aa226 name query
ca
parents: 8148
diff changeset
   176
! !
a867b07aa226 name query
ca
parents: 8148
diff changeset
   177
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   178
!ISO10646_to_UTF8 methodsFor:'stream support'!
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   179
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   180
readNext:charactersToRead charactersFrom:stream
11996
fm
parents: 11974
diff changeset
   181
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   182
    | s |
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   183
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   184
    s := (String new:charactersToRead) writeStream.
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   185
    charactersToRead timesRepeat:[
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   186
        | c |
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   187
        c := stream peek.
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   188
        s nextPutAll:(stream next:(self bytesToReadFor:c))
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   189
    ].
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   190
    ^ self decodeString:s contents
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   191
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   192
    "Created: / 16-06-2005 / 11:45:14 / masca"
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   193
!
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   194
18807
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   195
readNextCharacterFrom:stream
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   196
18807
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   197
    | c bytesYetToRead s |
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   198
    c := stream peek.
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   199
    bytesYetToRead := self bytesToReadFor:c codePoint.
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   200
    bytesYetToRead == 1 ifTrue:[ 
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   201
        stream next.
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   202
        ^ c.
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   203
    ].
18807
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   204
    s := (String new:1 + bytesYetToRead) writeStream.
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   205
    s nextPutAll:(stream next: bytesYetToRead).
25406
eba3da836698 Fix 16 years old bug in `EncodedStream >> #next` for UTF8 to Unicode decoder
Jan Vrany <jan.vrany@labware.com>
parents: 23107
diff changeset
   206
eba3da836698 Fix 16 years old bug in `EncodedStream >> #next` for UTF8 to Unicode decoder
Jan Vrany <jan.vrany@labware.com>
parents: 23107
diff changeset
   207
    s := self decodeString:s contents.
eba3da836698 Fix 16 years old bug in `EncodedStream >> #next` for UTF8 to Unicode decoder
Jan Vrany <jan.vrany@labware.com>
parents: 23107
diff changeset
   208
    self assert: s size == 1.
eba3da836698 Fix 16 years old bug in `EncodedStream >> #next` for UTF8 to Unicode decoder
Jan Vrany <jan.vrany@labware.com>
parents: 23107
diff changeset
   209
    ^ s first
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   210
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   211
    "Created: / 14-06-2005 / 17:03:59 / janfrog"
18807
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   212
    "Modified: / 03-10-2015 / 08:49:09 / Jan Vrany <jan.vrany@fit.cvut.cz>"
25406
eba3da836698 Fix 16 years old bug in `EncodedStream >> #next` for UTF8 to Unicode decoder
Jan Vrany <jan.vrany@labware.com>
parents: 23107
diff changeset
   213
    "Modified: / 29-01-2021 / 09:21:26 / Jan Vrany <jan.vrany@labware.com>"
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   214
! !
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   215
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   216
!ISO10646_to_UTF8 class methodsFor:'documentation'!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   217
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   218
version
18601
00dc53dfe54d class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 17623
diff changeset
   219
    ^ '$Header$'
21299
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
   220
!
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
   221
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
   222
version_CVS
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
   223
    ^ '$Header$'
21387
HG Automerge
parents: 19863 21301
diff changeset
   224
!
HG Automerge
parents: 19863 21301
diff changeset
   225
18807
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   226
version_HG
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   227
d79ce9fb5198 Fixed EncodedStream>>next for UTF8 to Unicode decoder.
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 18630
diff changeset
   228
    ^ '$Changeset: <not expanded> $'
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   229
! !
17489
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   230