CharacterEncoderImplementations__ISO10646_to_UTF8.st
author Claus Gittinger <cg@exept.de>
Wed, 13 Feb 2019 22:10:53 +0100
changeset 23735 77363fc65861
parent 22474 f42c97c037ed
child 25271 3b763ce09c7e
permissions -rw-r--r--
#FEATURE by cg class: OrderedDictionary changed: #copyValuesFrom:to: class: OrderedDictionary class changed: #version_CVS
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
22429
48389a135c35 #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 22413
diff changeset
     1
"{ Encoding: utf8 }"
48389a135c35 #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 22413
diff changeset
     2
8148
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     3
"
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     4
 COPYRIGHT (c) 2004 by eXept Software AG
8411
44509c4f92f0 *** empty log message ***
ca
parents: 8406
diff changeset
     5
	      All Rights Reserved
8148
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     6
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     7
 This software is furnished under a license and may be used
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     8
 only in accordance with the terms of that license and with the
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
     9
 inclusion of the above copyright notice.   This software may not
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    10
 be provided or otherwise made available to, or used by, any
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    11
 other person.  No title to or ownership of the software is
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    12
 hereby transferred.
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    13
"
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    14
"{ Package: 'stx:libbasic' }"
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    15
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    16
"{ NameSpace: CharacterEncoderImplementations }"
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    17
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
    18
VariableBytesEncoder subclass:#ISO10646_to_UTF8
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    19
	instanceVariableNames:''
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    20
	classVariableNames:''
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    21
	poolDictionaries:''
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    22
	category:'Collections-Text-Encodings'
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    23
!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    24
18604
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    25
ISO10646_to_UTF8 class instanceVariableNames:'theOneAndOnlyInstance'
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    26
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    27
"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    28
 No other class instance variables are inherited by this class.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    29
"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    30
!
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    31
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    32
!ISO10646_to_UTF8 class methodsFor:'documentation'!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    33
8148
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    34
copyright
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    35
"
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    36
 COPYRIGHT (c) 2004 by eXept Software AG
8411
44509c4f92f0 *** empty log message ***
ca
parents: 8406
diff changeset
    37
	      All Rights Reserved
8148
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    38
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    39
 This software is furnished under a license and may be used
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    40
 only in accordance with the terms of that license and with the
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    41
 inclusion of the above copyright notice.   This software may not
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    42
 be provided or otherwise made available to, or used by, any
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    43
 other person.  No title to or ownership of the software is
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    44
 hereby transferred.
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    45
"
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    46
!
dbf64e3142d9 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    47
21298
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    48
documentation
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    49
"
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
    50
    I can encode unicode characters into utf-8 and
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
    51
    decode utf-8 characters into unicode.
21299
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
    52
    
21298
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    53
    Notice the naming (many are confused):
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    54
        Unicode is the set of number-to-glyph assignments
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    55
    whereas:
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    56
        UTF8 is a concrete way of xmitting Unicode codePoints (numbers).
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    57
    UTF16 is another concrete encoding, for example.    
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    58
        
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    59
    ST/X NEVER uses UTF8 internally - all characters are full 24bit characters.
21301
f33ff66e5fff #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 21299
diff changeset
    60
    Only when exchanging data, are these converted into UTF8 (or other) byte sequences.
21298
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    61
"
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    62
!
cb1ce1924d13 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 19838
diff changeset
    63
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    64
examples
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    65
"
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    66
  Encoding (unicode to utf8)
8411
44509c4f92f0 *** empty log message ***
ca
parents: 8406
diff changeset
    67
     ISO10646_to_UTF8 encodeString:'hello'.
8297
e7a05a86f280 removed iso8859-chars (for hpux)
Claus Gittinger <cg@exept.de>
parents: 8221
diff changeset
    68
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    69
8297
e7a05a86f280 removed iso8859-chars (for hpux)
Claus Gittinger <cg@exept.de>
parents: 8221
diff changeset
    70
  Decoding (utf8 to unicode):
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    71
     |t|
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    72
22429
48389a135c35 #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 22413
diff changeset
    73
     t := ISO10646_to_UTF8 encodeString:'Helloœ'.
8411
44509c4f92f0 *** empty log message ***
ca
parents: 8406
diff changeset
    74
     ISO10646_to_UTF8 decodeString:t.
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    75
"
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    76
! !
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    77
18604
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    78
!ISO10646_to_UTF8 class methodsFor:'instance creation'!
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    79
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    80
flushSingleton
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    81
    "flushes the cached singleton"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    82
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    83
    theOneAndOnlyInstance := nil
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    84
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    85
    "
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    86
     self flushSingleton
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    87
    "
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    88
!
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    89
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    90
new
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    91
    "returns a singleton"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    92
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    93
    theOneAndOnlyInstance isNil ifTrue:[
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    94
        theOneAndOnlyInstance := self basicNew initialize.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    95
    ].
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    96
    ^ theOneAndOnlyInstance.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    97
!
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    98
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
    99
theOneAndOnlyInstance
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   100
    "returns a singleton"
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   101
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   102
    theOneAndOnlyInstance isNil ifTrue:[
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   103
        theOneAndOnlyInstance := self basicNew initialize.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   104
    ].
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   105
    ^ theOneAndOnlyInstance.
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   106
! !
54caf7b64994 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 18601
diff changeset
   107
22408
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   108
!ISO10646_to_UTF8 class methodsFor:'queries'!
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   109
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   110
bytesToReadFor:firstByte 
22413
b40bbf08ddd8 #DOCUMENTATION by stefan
Stefan Vogel <sv@exept.de>
parents: 22408
diff changeset
   111
    (firstByte bitAnd:2r10000000) == 0 ifTrue:[^ 1].
22408
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   112
    (firstByte bitAnd:2r11000000) == 2r10000000 ifTrue:[^ 2].
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   113
    (firstByte bitAnd:2r11100000) == 2r11000000 ifTrue:[^ 3].
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   114
    (firstByte bitAnd:2r11110000) == 2r11100000 ifTrue:[^ 4].
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   115
    (firstByte bitAnd:2r11111000) == 2r11110000 ifTrue:[^ 5].
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   116
    (firstByte bitAnd:2r11111100) == 2r11111000 ifTrue:[^ 6].
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   117
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   118
    InvalidEncodingError raiseWith:firstByte errorString:' - unsupported utf8 encoding (too large, only 31bit supported)'
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   119
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   120
    "Created: / 14-06-2005 / 17:17:24 / janfrog"
22413
b40bbf08ddd8 #DOCUMENTATION by stefan
Stefan Vogel <sv@exept.de>
parents: 22408
diff changeset
   121
    "Modified: / 10-01-2018 / 22:59:20 / stefan"
22408
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   122
! !
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   123
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   124
!ISO10646_to_UTF8 methodsFor:'encoding & decoding'!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   125
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   126
decodeString:aStringOrByteCollection
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   127
    "given a string in UTF8 encoding,
17489
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   128
     return a new string containing the same characters, in Unicode encoding.
17623
6fe31bc70e49 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 17489
diff changeset
   129
     Returns either a normal String, a Unicode16String or a Unicode32String instance.
17489
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   130
     This is only useful, when reading from external sources or communicating with
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   131
     other systems 
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   132
     (ST/X never uses utf8 internally, but always uses strings of fully decoded unicode characters).
19838
a6ca726d596c #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 18625
diff changeset
   133
     This only handles up-to 30bit characters."
14172
8c2cf2a68116 changed:
Stefan Vogel <sv@exept.de>
parents: 11996
diff changeset
   134
19838
a6ca726d596c #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 18625
diff changeset
   135
    ^ CharacterArray decodeFromUTF8:aStringOrByteCollection.
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   136
!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   137
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   138
encodeString:aUnicodeString
17489
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   139
    "return the UTF-8 representation of a Unicode string.
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   140
     The resulting string is only useful to be stored on some external file,
19838
a6ca726d596c #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 18625
diff changeset
   141
     not for being used inside ST/X."
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   142
19838
a6ca726d596c #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 18625
diff changeset
   143
    ^ aUnicodeString utf8Encoded.
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   144
! !
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   145
8163
a867b07aa226 name query
ca
parents: 8148
diff changeset
   146
!ISO10646_to_UTF8 methodsFor:'queries'!
a867b07aa226 name query
ca
parents: 8148
diff changeset
   147
22429
48389a135c35 #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 22413
diff changeset
   148
characterSize:charOrCodePoint
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   149
    "return the number of bytes required to encode codePoint"
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   150
22429
48389a135c35 #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 22413
diff changeset
   151
    ^ charOrCodePoint asCharacter utf8BytesPerCharacter.
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   152
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   153
    "Created: / 15-06-2005 / 15:16:22 / janfrog"
22413
b40bbf08ddd8 #DOCUMENTATION by stefan
Stefan Vogel <sv@exept.de>
parents: 22408
diff changeset
   154
    "Modified: / 03-01-2018 / 23:05:59 / stefan"
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   155
!
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   156
8163
a867b07aa226 name query
ca
parents: 8148
diff changeset
   157
nameOfEncoding
14172
8c2cf2a68116 changed:
Stefan Vogel <sv@exept.de>
parents: 11996
diff changeset
   158
    ^ #utf8
8163
a867b07aa226 name query
ca
parents: 8148
diff changeset
   159
! !
a867b07aa226 name query
ca
parents: 8148
diff changeset
   160
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   161
!ISO10646_to_UTF8 methodsFor:'stream support'!
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   162
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   163
encodeCharacter:aUnicodeCharacter on:aStream
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   164
    "given a character in unicode, encode it onto aStream."
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   165
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   166
    aStream nextPutUtf8:aUnicodeCharacter.
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   167
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   168
    "Created: / 16-02-2017 / 16:20:57 / stefan"
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   169
!
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   170
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   171
encodeString:aUnicodeString on:aStream
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   172
    "given a string in unicode, encode it onto aStream."
11996
fm
parents: 11974
diff changeset
   173
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   174
     aStream nextPutAllUtf8:aUnicodeString.
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   175
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   176
    "Created: / 16-02-2017 / 16:27:31 / stefan"
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   177
!
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   178
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   179
readNext:charactersToReadArg charactersFrom:aStream
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   180
    "decode the next charactersToRead on aStream from utf-8 to unicode"
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   181
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   182
    |s c cp hasUtf8 charactersToRead "{ Class:SmallInteger }"|
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   183
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   184
    charactersToRead := charactersToReadArg.
22408
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   185
    hasUtf8 := false.
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   186
    "stream may be both text or bytes"
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   187
    s := (aStream contentsSpecies new:charactersToRead) writeStream.
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   188
    charactersToRead timesRepeat:[
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   189
        c := aStream next.
22408
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   190
        s nextPut:c.
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   191
        cp := c codePoint.
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   192
        (cp bitTest:16r80) ifTrue:[
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   193
            hasUtf8 := true.
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   194
            s nextPutAll:(aStream next:(self class bytesToReadFor:cp)-1).
22408
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   195
        ].
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   196
    ].
22408
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   197
    hasUtf8 ifTrue:[
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   198
        ^ self decodeString:s contents.
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   199
    ].
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   200
    ^ s contents asString
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   201
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   202
    "Created: / 16-06-2005 / 11:45:14 / masca"
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   203
    "Modified (comment): / 17-01-2018 / 13:24:42 / stefan"
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   204
!
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   205
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   206
readNextCharacterFrom:aStream
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   207
    "decode the next character or byte on aStream from utf-8 to unicode"
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   208
22408
09eceae5d786 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21477
diff changeset
   209
    ^ Character utf8DecodeFrom:aStream.
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   210
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   211
    "Created: / 14-06-2005 / 17:03:59 / janfrog"
22413
b40bbf08ddd8 #DOCUMENTATION by stefan
Stefan Vogel <sv@exept.de>
parents: 22408
diff changeset
   212
    "Modified: / 10-01-2018 / 17:35:40 / stefan"
22474
f42c97c037ed #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22429
diff changeset
   213
    "Modified (comment): / 17-01-2018 / 13:24:08 / stefan"
11974
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   214
! !
bbbf98b676b0 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 10670
diff changeset
   215
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   216
!ISO10646_to_UTF8 class methodsFor:'documentation'!
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   217
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   218
version
18601
00dc53dfe54d class: CharacterEncoderImplementations::ISO10646_to_UTF8
Stefan Vogel <sv@exept.de>
parents: 17623
diff changeset
   219
    ^ '$Header$'
21299
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
   220
!
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
   221
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
   222
version_CVS
819caa3926a2 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 21298
diff changeset
   223
    ^ '$Header$'
8081
b468050174a9 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   224
! !
17489
22f6151b5135 class: CharacterEncoderImplementations::ISO10646_to_UTF8
Claus Gittinger <cg@exept.de>
parents: 14172
diff changeset
   225