CharacterEncoderImplementations__ISO10646_to_UTF16BE.st
author Claus Gittinger <cg@exept.de>
Tue, 09 Jul 2019 20:55:17 +0200
changeset 24417 03b083548da2
parent 24214 7853c5e56c17
child 25270 abd76d94ad4f
permissions -rw-r--r--
#REFACTORING by exept class: Smalltalk class changed: #recursiveInstallAutoloadedClassesFrom:rememberIn:maxLevels:noAutoload:packageTop:showSplashInLevels: Transcript showCR:(... bindWith:...) -> Transcript showCR:... with:...
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     1
"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     2
 COPYRIGHT (c) 2005 by eXept Software AG
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     3
              All Rights Reserved
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     4
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     5
 This software is furnished under a license and may be used
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     6
 only in accordance with the terms of that license and with the
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     7
 inclusion of the above copyright notice.   This software may not
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     8
 be provided or otherwise made available to, or used by, any
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     9
 other person.  No title to or ownership of the software is
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    10
 hereby transferred.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    11
"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    12
"{ Package: 'stx:libbasic' }"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    13
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    14
"{ NameSpace: CharacterEncoderImplementations }"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    15
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
    16
VariableBytesEncoder subclass:#ISO10646_to_UTF16BE
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    17
	instanceVariableNames:''
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    18
	classVariableNames:''
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    19
	poolDictionaries:''
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    20
	category:'Collections-Text-Encodings'
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    21
!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    22
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    23
!ISO10646_to_UTF16BE class methodsFor:'documentation'!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    24
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    25
copyright
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    26
"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    27
 COPYRIGHT (c) 2005 by eXept Software AG
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    28
              All Rights Reserved
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    29
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    30
 This software is furnished under a license and may be used
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    31
 only in accordance with the terms of that license and with the
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    32
 inclusion of the above copyright notice.   This software may not
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    33
 be provided or otherwise made available to, or used by, any
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    34
 other person.  No title to or ownership of the software is
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    35
 hereby transferred.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    36
"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    37
!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    38
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    39
documentation
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    40
"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    41
    encodes/decodes UTF16 BigEndian (big-end-first)
21300
9822704484f4 #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 19621
diff changeset
    42
9822704484f4 #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 19621
diff changeset
    43
    Notice the naming (many are confused):
9822704484f4 #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 19621
diff changeset
    44
        Unicode is the set of number-to-glyph assignments
9822704484f4 #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 19621
diff changeset
    45
    whereas:
9822704484f4 #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 19621
diff changeset
    46
        UTF8, UTF16 etc. are a concrete way of xmitting Unicode codePoints (numbers).
9822704484f4 #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 19621
diff changeset
    47
9822704484f4 #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 19621
diff changeset
    48
    ST/X NEVER uses UTF8 or UTF16 internally - all characters are full 24bit characters.
9822704484f4 #OTHER by cg
Claus Gittinger <cg@exept.de>
parents: 19621
diff changeset
    49
    Only when exchanging data, are these converted into UTF8 (or other) byte sequences.
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    50
"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    51
!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    52
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    53
examples
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    54
"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    55
  Encoding (unicode to utf16BE)
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    56
     ISO10646_to_UTF16BE encodeString:'hello'.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    57
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    58
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    59
  Decoding (utf16BE to unicode):
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    60
     |t|
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    61
24214
7853c5e56c17 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 22472
diff changeset
    62
     t := ISO10646_to_UTF16BE encodeString:'ÄÖÜß'.
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    63
     ISO10646_to_UTF16BE decodeString:t.
9325
a4c635a6f8eb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8903
diff changeset
    64
a4c635a6f8eb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8903
diff changeset
    65
  Decoding (utf16LE-Bytes to unicode):
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
    66
     ISO10646_to_UTF16LE decodeString:#[ 16r40 0 16r41 0 16r42 0 16r43 0 16r44 0 ].
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
    67
     ISO10646_to_UTF16BE decodeString:#[ 16r40 0 16r41 0 16r42 0 16r43 0 16r44 0 ] copy swapBytes.
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    68
"
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    69
! !
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    70
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    71
!ISO10646_to_UTF16BE methodsFor:'encoding & decoding'!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    72
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    73
decodeString:aStringOrByteCollection
14170
eed0dbcc471c comment/format in: #decodeString:
Stefan Vogel <sv@exept.de>
parents: 12432
diff changeset
    74
    "given a byteArray (2-bytes per character) or unsignedShortArray in UTF16 encoding,
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    75
     return a new string containing the same characters, in 8, 16bit (or more) encoding.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    76
     Returns either a normal String, a TwoByte- or a FourByte-String instance.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    77
     Only useful, when reading from external sources.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    78
     This only handles up-to 30bit characters."
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    79
14170
eed0dbcc471c comment/format in: #decodeString:
Stefan Vogel <sv@exept.de>
parents: 12432
diff changeset
    80
    |s newString bitsPerElementIn nextIn
12432
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
    81
     codeIn codeIn1 codeIn2 estimatedSize out|
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    82
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    83
    aStringOrByteCollection isByteArray ifTrue:[
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    84
        bitsPerElementIn := 8.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    85
    ] ifFalse:[
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    86
        aStringOrByteCollection isString ifTrue:[
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    87
            bitsPerElementIn := aStringOrByteCollection bitsPerCharacter.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    88
        ] ifFalse:[
14175
84e1adb65a5d changed: #decodeString: detect odd number of bytes
Stefan Vogel <sv@exept.de>
parents: 14170
diff changeset
    89
            "can be a ShortArray"
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    90
            bitsPerElementIn := 16.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    91
        ].
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    92
    ].
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    93
14175
84e1adb65a5d changed: #decodeString: detect odd number of bytes
Stefan Vogel <sv@exept.de>
parents: 14170
diff changeset
    94
    s := aStringOrByteCollection readStream.
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    95
    bitsPerElementIn == 8 ifTrue:[
14175
84e1adb65a5d changed: #decodeString: detect odd number of bytes
Stefan Vogel <sv@exept.de>
parents: 14170
diff changeset
    96
        s size odd ifTrue:[
14208
4a7349aba15f changed: #decodeString:
Claus Gittinger <cg@exept.de>
parents: 14175
diff changeset
    97
            InvalidEncodingError raiseWith:aStringOrByteCollection errorString:' - size is not a multiple of 2 bytes'.
14175
84e1adb65a5d changed: #decodeString: detect odd number of bytes
Stefan Vogel <sv@exept.de>
parents: 14170
diff changeset
    98
        ].
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    99
        nextIn := [self nextTwoByteValueFrom:s].
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   100
    ] ifFalse:[
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   101
        nextIn := [s next].
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   102
    ].
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   103
14175
84e1adb65a5d changed: #decodeString: detect odd number of bytes
Stefan Vogel <sv@exept.de>
parents: 14170
diff changeset
   104
    estimatedSize := s size * bitsPerElementIn // 16.
12432
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   105
    out := CharacterWriteStream on:(String new:estimatedSize).
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   106
    [s atEnd] whileFalse:[
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   107
        codeIn := nextIn value.
24214
7853c5e56c17 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 22472
diff changeset
   108
        codeIn > 16rFF ifTrue:[
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   109
            (codeIn between:16rD800 and:16rDBFF) ifTrue:[
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   110
                codeIn1 := codeIn.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   111
                codeIn2 := nextIn value.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   112
                codeIn := ((codeIn1 - 16rD800) bitShift:10)
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   113
                          +
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   114
                          (codeIn2 - 16rDC00)
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   115
                          + 16r00010000.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   116
            ].
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   117
        ].
12432
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   118
        out nextPut:(Character value:codeIn).
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   119
    ].
12432
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   120
    newString := out contents.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   121
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   122
"/    nBitsRequired := 8.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   123
"/    sz := 0.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   124
"/    [s atEnd] whileFalse:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   125
"/        codeIn := nextIn value.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   126
"/        sz := sz + 1.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   127
"/
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   128
"/        codeIn <= 16rFF ifTrue:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   129
"/        ] ifFalse:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   130
"/            nBitsRequired := nBitsRequired max:16.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   131
"/            (codeIn between:16rD800 and:16rDBFF) ifTrue:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   132
"/                nBitsRequired := 32.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   133
"/                codeIn2 := nextIn value.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   134
"/            ].
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   135
"/        ]
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   136
"/    ].
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   137
"/
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   138
"/    nBitsRequired == 8 ifTrue:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   139
"/        newString := String uninitializedNew:sz
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   140
"/    ] ifFalse:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   141
"/        nBitsRequired <= 16 ifTrue:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   142
"/            newString := Unicode16String new:sz
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   143
"/        ] ifFalse:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   144
"/            newString := Unicode32String new:sz
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   145
"/        ]
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   146
"/    ].
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   147
"/
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   148
"/    s := aStringOrByteCollection readStream.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   149
"/    idx := 1.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   150
"/    [s atEnd] whileFalse:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   151
"/        codeIn := nextIn value.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   152
"/        codeIn <= 16rFF ifTrue:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   153
"/        ] ifFalse:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   154
"/            nBitsRequired := nBitsRequired max:16.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   155
"/            (codeIn between:16rD800 and:16rDBFF) ifTrue:[
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   156
"/                nBitsRequired := 32.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   157
"/                codeIn1 := codeIn.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   158
"/                codeIn2 := nextIn value.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   159
"/                codeIn := ((codeIn1 - 16rD800) bitShift:10)
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   160
"/                          +
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   161
"/                          (codeIn2 - 16rDC00)
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   162
"/                          + 16r00010000.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   163
"/            ].
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   164
"/        ].
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   165
"/        newString at:idx put:(Character value:codeIn).
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   166
"/        idx := idx + 1.
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   167
"/    ].
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   168
    ^ newString
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   169
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   170
    "
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   171
     self new decodeString:#[ 16r00 16r42 ]            
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   172
     self new decodeString:#[ 16r01 16r42 ]            
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   173
     self new decodeString:#[ 16r00 16r48
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   174
                              16r00 16r69  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   175
                              16rD8 16r00  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   176
                              16rDC 16r00  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   177
                              16r00 16r21  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   178
                              16r00 16r21  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   179
                            ]            
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   180
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   181
     self new decodeString:#( 16r0048
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   182
                              16r0069  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   183
                              16rD800  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   184
                              16rDC00  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   185
                              16r0021  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   186
                              16r0021  
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   187
                            )
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   188
    "
14208
4a7349aba15f changed: #decodeString:
Claus Gittinger <cg@exept.de>
parents: 14175
diff changeset
   189
4a7349aba15f changed: #decodeString:
Claus Gittinger <cg@exept.de>
parents: 14175
diff changeset
   190
    "Modified: / 12-07-2012 / 19:56:12 / cg"
24214
7853c5e56c17 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 22472
diff changeset
   191
    "Modified: / 28-05-2019 / 14:08:19 / Stefan Vogel"
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   192
!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   193
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   194
encode:aCode
12432
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   195
    ^ aCode
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   196
!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   197
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   198
encodeString:aUnicodeString
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   199
    "return the UTF-16 representation of a aUnicodeString.
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   200
     The resulting string is only useful to be stored on some external file,
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   201
     not for being used inside ST/X."
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   202
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   203
    |stream size "{ Class:SmallInteger }"|
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   204
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   205
    size := aUnicodeString size.
24214
7853c5e56c17 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 22472
diff changeset
   206
    stream := WriteStream on:(ByteArray uninitializedNew:size * 2).
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   207
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   208
    1 to:size do:[:idx |
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   209
        stream nextPutUtf16Bytes:(aUnicodeString at:idx) MSB:true.
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   210
    ].
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   211
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   212
    ^ stream contents
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   213
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   214
    "
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   215
     (self encodeString:'hello')                                         #[0 104 0 101 0 108 0 108 0 111]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   216
     (self encodeString:(Character value:16r40) asString)                #[0 64]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   217
     (self encodeString:(Character value:16rFF) asString)                #[0 255]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   218
     (self encodeString:(Character value:16r100) asString)               #[1 0]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   219
     (self encodeString:(Character value:16r1000) asString)              #[16 0]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   220
     (self encodeString:(Character value:16r2000) asString)              #[32 0]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   221
     (self encodeString:(Character value:16r4000) asString)              #[64 0]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   222
     (self encodeString:(Character value:16r8000) asString)              #[128 0]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   223
     (self encodeString:(Character value:16rD7FF) asString)              #[215 255]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   224
     (self encodeString:(Character value:16rE000) asString)              #[224 0]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   225
     (self encodeString:(Character value:16rFFFF) asString)              #[255 255]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   226
     (self encodeString:(Character value:16r10000) asString)             #[216 64 220 0]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   227
     (self encodeString:(Character value:16r10FFF) asString)             #[216 67 223 255]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   228
     (self encodeString:(Character value:16r1FFFF) asString)             #[216 127 223 255]
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   229
     (self encodeString:(Character value:16r10FFFF) asString)            #[219 255 223 255]             
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   230
    error cases:
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   231
     (self encodeString:(Character value:16rD800) asString) 
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   232
     (self encodeString:(Character value:16rD801) asString) 
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   233
     (self encodeString:(Character value:16rDFFF) asString) 
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   234
     (self encodeString:(Character value:16r110000) asString)   
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   235
    "
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   236
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   237
    "Modified: / 16-01-2018 / 19:38:30 / stefan"
24214
7853c5e56c17 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 22472
diff changeset
   238
    "Modified: / 28-05-2019 / 13:49:53 / Stefan Vogel"
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   239
! !
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   240
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   241
!ISO10646_to_UTF16BE methodsFor:'private'!
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   242
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   243
nextTwoByteValueFrom:aStream
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   244
    ^ aStream nextUnsignedInt16MSB:true
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   245
! !
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   246
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   247
!ISO10646_to_UTF16BE methodsFor:'queries'!
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   248
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   249
characterSize:charOrCodePoint
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   250
    "return the number of bytes required to encode codePoint"
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   251
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   252
    ^ charOrCodePoint codePoint <= 16rFFFF ifTrue:[2] ifFalse:[4]
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   253
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   254
    "Created: / 16-01-2018 / 19:21:09 / stefan"
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   255
!
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   256
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   257
nameOfEncoding
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   258
    ^ #utf16be
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   259
! !
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   260
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   261
!ISO10646_to_UTF16BE methodsFor:'stream support'!
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   262
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   263
encodeCharacter:aUnicodeCharacter on:aStream
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   264
    "given a string in unicode, encode it onto aStream."
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   265
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   266
     aStream nextPutUtf16Bytes:aUnicodeCharacter MSB:true.
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   267
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   268
    "Created: / 16-02-2017 / 16:41:25 / stefan"
21475
6409725581b1 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21300
diff changeset
   269
!
6409725581b1 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21300
diff changeset
   270
6409725581b1 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21300
diff changeset
   271
encodeString:aUnicodeString on:aStream
6409725581b1 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21300
diff changeset
   272
    "given a string in unicode, encode it onto aStream."
6409725581b1 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21300
diff changeset
   273
6409725581b1 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21300
diff changeset
   274
     aStream nextPutAllUtf16Bytes:aUnicodeString MSB:true.
6409725581b1 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21300
diff changeset
   275
6409725581b1 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 21300
diff changeset
   276
    "Created: / 16-02-2017 / 16:40:32 / stefan"
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   277
!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   278
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   279
readNextCharacterFrom:aStream
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   280
    |codeIn codeIn2|
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   281
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   282
    codeIn := self nextTwoByteValueFrom:aStream.
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   283
    codeIn isNil ifTrue:[
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   284
        ^ nil.
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   285
    ].
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   286
    (codeIn between:16rD800 and:16rDBFF) ifTrue:[
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   287
        codeIn2 := self nextTwoByteValueFrom:aStream.
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   288
        codeIn2 isNil ifTrue:[
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   289
            InvalidEncodingError raiseErrorString:' - UTF16 missing followBytes'.
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   290
        ].
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   291
        codeIn :=  ((codeIn - 16rD800) bitShift:10)
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   292
                  + (codeIn2 - 16rDC00)
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   293
                  + 16r00010000.
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   294
    ].
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   295
22472
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   296
    ^ Character codePoint:codeIn.
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   297
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   298
    "Created: / 16-01-2018 / 22:31:29 / stefan"
45940fc5e0ad #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21475
diff changeset
   299
    "Modified: / 17-01-2018 / 14:41:31 / stefan"
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   300
! !
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   301
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   302
!ISO10646_to_UTF16BE class methodsFor:'documentation'!
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   303
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   304
version
19217
0a4b3101afec #REFACTORING
Claus Gittinger <cg@exept.de>
parents: 14208
diff changeset
   305
    ^ '$Header$'
12432
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   306
!
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   307
2c2adc733221 changed:
Claus Gittinger <cg@exept.de>
parents: 9325
diff changeset
   308
version_CVS
19217
0a4b3101afec #REFACTORING
Claus Gittinger <cg@exept.de>
parents: 14208
diff changeset
   309
    ^ '$Header$'
8903
4e15c297fadc initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   310
! !
19217
0a4b3101afec #REFACTORING
Claus Gittinger <cg@exept.de>
parents: 14208
diff changeset
   311