CharacterEncoderImplementations__ISO10646_to_UTF8_MAC.st
author Stefan Vogel <sv@exept.de>
Fri, 27 Oct 2017 16:14:37 +0200
branchexpecco_2_11_1_branch
changeset 22329 20662662693b
parent 17568 e90410336cc2
child 21478 2e63fbcbfa85
permissions -rw-r--r--
Add 2.11.0 Patch
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     1
"{ Encoding: utf8 }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     2
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     3
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     4
 COPYRIGHT (c) 2015 by eXept Software AG
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     5
              All Rights Reserved
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     6
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     7
 This software is furnished under a license and may be used
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     8
 only in accordance with the terms of that license and with the
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     9
 inclusion of the above copyright notice.   This software may not
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    10
 be provided or otherwise made available to, or used by, any
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    11
 other person.  No title to or ownership of the software is
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    12
 hereby transferred.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    13
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    14
"{ Package: 'stx:libbasic' }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    15
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    16
"{ NameSpace: CharacterEncoderImplementations }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    17
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    18
ISO10646_to_UTF8 subclass:#ISO10646_to_UTF8_MAC
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    19
	instanceVariableNames:''
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    20
	classVariableNames:'AccentMap DecomposeMap ComposeMap'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    21
	poolDictionaries:''
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    22
	category:'Collections-Text-Encodings'
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    23
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    24
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    25
!ISO10646_to_UTF8_MAC class methodsFor:'documentation'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    26
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    27
copyright
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    28
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    29
 COPYRIGHT (c) 2015 by eXept Software AG
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    30
              All Rights Reserved
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    31
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    32
 This software is furnished under a license and may be used
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    33
 only in accordance with the terms of that license and with the
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    34
 inclusion of the above copyright notice.   This software may not
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    35
 be provided or otherwise made available to, or used by, any
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    36
 other person.  No title to or ownership of the software is
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    37
 hereby transferred.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    38
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    39
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    40
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    41
documentation
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    42
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    43
    UTF-8 can encode some diacritical characters (umlauts) in multiple ways:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    44
        - either with a single uniode (e.g. ae -> ä -> &#228 -> C3 A4)
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    45
        - or as so called 'Normalization Form canonical Decomposition', i.e. as a regular 'a' followed by a
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    46
          combining diacritical mark (for example: acute).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    47
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    48
    MAC OSX needs the second form for its file names.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    49
    However, OSX does not decompose the ranges U+2000-U+2FFF, U+F900-U+FAFF and U+2F800-U+2FAFF.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    50
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    51
    This is a q&d hack, to at least support the first page (latin1) characters.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    52
    Will be enhanced for the 2nd and 3rd unicode page, when I find time.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    53
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    54
    [caveat:]
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    55
        only a small subset of multi-composes are supported yet (for example: trema plus acute)
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    56
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    57
    [author:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    58
        Claus Gittinger
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    59
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    60
    [instance variables:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    61
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    62
    [class variables:]
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    63
        ComposeMap DecomposeMap
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    64
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    65
    [see also:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    66
        http://developer.apple.com/library/mac/#qa/qa2001/qa1173.html
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    67
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    68
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    69
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    70
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    71
!ISO10646_to_UTF8_MAC class methodsFor:'initialization'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    72
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    73
initializeDecomposeMap
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    74
    "the map which decomposes a diacritical character into its two components"
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    75
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    76
    DecomposeMap := Dictionary new.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    77
    ComposeMap := Dictionary new.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    78
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    79
    #(
17566
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    80
        "/ attention: the following strings contain non-latin characters
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    81
        "/ if you don't see them, change your font setting for a better font
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    82
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    83
        (16r0300 "gravis"       'AÀaàEÈeèIÌiìoòOÒUÙuùNǸnǹWẀwẁYỲyỳÜǛüǜ')  
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    84
        (16r0301 "akut"         'AÁaáEÉeéIÍiíOÓoóUÚuúyýYÝCĆcćNŃnńRŔrŕSŚsśZŹzźGǴgǵÆǼæǽØǾøǿMḾmḿKḰkḱPṔpṕWẂwẃÜǗüǘ') 
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    85
        (16r0302 "circonflex"   'AÂaâEÊeêIÎiîOÔoôUÛuûCĈcĉGĜgĝHĤhĥJĴjĵSŜsŝWŴwŵYŶyŷZẐzẑ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    86
        (16r0303 "tilde"        'AÃaãNÑnñOÕoõUŨuũYỸyỹEẼeẽVṼvṽ')
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    87
        (16r0304 "macron"       'AĀaāEĒeēIĪiīOŌoōUŪuūGḠgḡÜǕüǖ' ) 
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    88
        (16r0306 "breve"        'AĂaăEĔeĕGĞgğIĬiĭOŎoŏUŬuŭ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    89
        (16r0307 "dot above"    'AȦaȧOȮoȯCĊcċEĖeėGĠgġZŻzżBḂbḃDḊdḋFḞfḟHḢhḣMṀmṁNṄnṅPṖpṗRṘrṙSṠsṡTṪtṫWẆwẇXẊxẋYẎyẏ' )
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    90
        (16r0308 "umlaut/trema" 'AÄaäEËeëOÖoöUÜuüIÏiïyÿYŸHḦhḧXẌxẍtẗÙǛùǜŪǕūǖÚǗúǘǓǙǔǚ')
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    91
        (16r030A "ring"         'AÅaåUŮuůwẘyẙ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    92
        (16r030B "dbl akut"     'OŐoőUŰuű')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    93
        (16r030C "hatcheck"     'CČcčDĎEĚeěNŇnňRŘrřSŠsšZŽzžAǍaǎIǏiǐOǑoǒUǓuǔGǦgǧKǨkǩÜǙüǚ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    94
        (16r030F "dbl grave"    'AȀaȁEȄeȅIȈiȉOȌoȍRȐrȑUȔuȕ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    95
        (16r0311 "inv. breve"   'AȂaȃEȆeȇIȊiȋOȎoȏRȒrȓUȖuȗ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    96
        (16r0317 "acute. below" 'KĶkķLĻlļNŅnņRŖrŗSȘsșTȚtț')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    97
        (16r0327 "cedille"      'CÇc窺TŢtţEȨeȩDḐdḑHḨhḩ')       
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    98
        (16r0328 "ogonek"       'AĄaąEĘeęIĮiįOǪoǫUŲuų')
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    99
    ) do:[:eachPair |
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   100
        |composeCode mapping|
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   101
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   102
        composeCode := eachPair first.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   103
        mapping := eachPair second.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   104
        mapping pairWiseDo:[:baseChar :composedChar |
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   105
            "/ setup, so that we find
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   106
            "/    DecomposeMap at:"$à codePoint" 16rE0 put:#( "$a codePoint" 16r61 "greve codePoint" 16r0300).
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   107
            DecomposeMap 
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   108
                at:composedChar codePoint 
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   109
                put:(Array with:baseChar codePoint with:composeCode)
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   110
        ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   111
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   112
        ComposeMap at:composeCode put:mapping.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   113
    ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   114
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   115
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   116
!ISO10646_to_UTF8_MAC methodsFor:'encoding & decoding'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   117
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   118
compositionOf: baseChar with: diacriticalChar  to: outStream
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   119
    "compose two characters into one
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   120
     a + umlaut-diacritic-mark -> ä."
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   121
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   122
    |cp map i|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   123
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   124
    cp := diacriticalChar codePoint.
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   125
    (cp between:16r300 and:16r328) ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   126
        map := ComposeMap at:cp ifAbsent:nil.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   127
        map notNil ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   128
            "/ compose
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   129
            i := map indexOf: baseChar.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   130
            i ~~ 0 ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   131
                outStream nextPut: (map at:i+1).
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   132
                ^ self.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   133
            ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   134
        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   135
    ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   136
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   137
    "/ leave as is
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   138
    outStream nextPut: baseChar.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   139
    outStream nextPut: diacriticalChar.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   140
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   141
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   142
decodeString:aStringOrByteCollection
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   143
    "return a Unicode string from the passed in UTF-8-MAC encoded string.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   144
     This is UTF-8 with compose-characters decomposed 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   145
     (i.e. as separate codes, not as single combined characters).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   146
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   147
     For now, here is a limited version, which should work
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   148
     at least for most european countries...
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   149
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   150
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   151
    |s buff previous|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   152
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   153
    s := super decodeString:aStringOrByteCollection.
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   154
    (s contains:[:char | char codePoint between:16r0300 and:16r0328]) ifFalse:[^ s].
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   155
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   156
    ComposeMap isNil ifTrue:[
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   157
        self class initializeDecomposeMap
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   158
    ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   159
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   160
    buff := CharacterWriteStream on:''.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   161
    previous := nil.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   162
    s do:[:each |
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   163
        (each codePoint between:16r0300 and:16r0328) ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   164
            previous isNil ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   165
                buff isEmpty ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   166
                    "/ wrong - combiner not allowed here.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   167
                    buff nextPut:each.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   168
                ] ifFalse:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   169
                    "/ ouch - a multi-compose
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   170
                    previous := buff last.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   171
                    buff skip:-1.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   172
                    self compositionOf:previous with:each to:buff.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   173
                ].
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   174
            ] ifFalse:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   175
                self compositionOf:previous with:each to:buff.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   176
            ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   177
            previous := nil.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   178
        ] ifFalse:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   179
            previous notNil ifTrue:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   180
                buff nextPut:previous.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   181
            ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   182
            previous := each.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   183
        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   184
    ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   185
    previous notNil ifTrue:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   186
        buff nextPut:previous.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   187
    ].
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   188
    ^ buff contents.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   189
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   190
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   191
     (ISO10646_to_UTF8 new encodeString:'aäoöuü') asByteArray   
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   192
        -> #[97 195 164 111 195 182 117 195 188]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   193
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   194
     (ISO10646_to_UTF8 new decodeString:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   195
            (ISO10646_to_UTF8 new encodeString:'aäoöuü') asByteArray)    
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   196
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   197
    (ISO10646_to_UTF8_MAC new encodeString:'aäoöuü') asByteArray 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   198
        -> #[97 97 204 136 111 111 204 136 117 117 204 136]  
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   199
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   200
     (ISO10646_to_UTF8_MAC new decodeString:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   201
            (ISO10646_to_UTF8_MAC new encodeString:'aäoöuü') asByteArray)    
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   202
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   203
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   204
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   205
decompositionOf: codePointIn into:outBlockWithTwoArgs
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   206
    "if required, decompose a diacritical character into a base character and a punctuation;
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   207
     eg. ä -> a + umlaut-diacritic-mark.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   208
     Pass both as args to the given block.
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   209
     For non diactit. chars, pass a nil diacrit-mark value.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   210
     Return true, if a decomposition was done."
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   211
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   212
    |entry|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   213
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   214
    codePointIn < 16rC0 ifTrue:[ ^ false ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   215
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   216
    entry := DecomposeMap at:codePointIn ifAbsent:nil.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   217
    entry isNil ifTrue:[ ^ false ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   218
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   219
    outBlockWithTwoArgs value:(entry at:1) value:(entry at:2).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   220
    ^ true
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   221
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   222
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   223
encodeString:aUnicodeString
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   224
    "return the UTF-8-MAC representation of a aUnicodeString.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   225
     This is UTF-8 with compose-characters decompose (i.e. as separate codes, not as
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   226
     single combined characters).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   227
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   228
     For now, here is a limited version, which should work
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   229
     at least for most european countries...
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   230
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   231
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   232
    |gen s decomp codePoint composeCodePoint|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   233
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   234
    DecomposeMap isNil ifTrue:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   235
        self class initializeDecomposeMap
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   236
    ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   237
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   238
    gen := 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   239
        [:codePointArg |
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   240
            |codePoint "{Class: SmallInteger }" b1 b2 b3 b4 b5 v "{Class: SmallInteger }"|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   241
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   242
            codePoint := codePointArg.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   243
            codePoint <= 16r7F ifTrue:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   244
                s nextPut:(Character value:codePoint).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   245
            ] ifFalse:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   246
                b1 := Character value:((codePoint bitAnd:16r3F) bitOr:2r10000000).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   247
                v := codePoint bitShift:-6.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   248
                v <= 16r1F ifTrue:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   249
                    s nextPut:(Character value:(v bitOr:2r11000000)).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   250
                    s nextPut:b1.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   251
                ] ifFalse:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   252
                    b2 := Character value:((v bitAnd:16r3F) bitOr:2r10000000).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   253
                    v := v bitShift:-6.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   254
                    v <= 16r0F ifTrue:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   255
                        s nextPut:(Character value:(v bitOr:2r11100000)).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   256
                        s nextPut:b2; nextPut:b1.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   257
                    ] ifFalse:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   258
                        b3 := Character value:((v bitAnd:16r3F) bitOr:2r10000000).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   259
                        v := v bitShift:-6.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   260
                        v <= 16r07 ifTrue:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   261
                            s nextPut:(Character value:(v bitOr:2r11110000)).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   262
                            s nextPut:b3; nextPut:b2; nextPut:b1.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   263
                        ] ifFalse:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   264
                            b4 := Character value:((v bitAnd:16r3F) bitOr:2r10000000).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   265
                            v := v bitShift:-6.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   266
                            v <= 16r03 ifTrue:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   267
                                s nextPut:(Character value:(v bitOr:2r11111000)).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   268
                                s nextPut:b4; nextPut:b3; nextPut:b2; nextPut:b1.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   269
                            ] ifFalse:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   270
                                b5 := Character value:((v bitAnd:16r3F) bitOr:2r10000000).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   271
                                v := v bitShift:-6.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   272
                                v <= 16r01 ifTrue:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   273
                                    s nextPut:(Character value:(v bitOr:2r11111100)).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   274
                                    s nextPut:b5; nextPut:b4; nextPut:b3; nextPut:b2; nextPut:b1.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   275
                                ] ifFalse:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   276
                                    "/ cannot happen - we only support up to 30 bit characters
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   277
                                    self error:'ascii value > 31bit in utf8Encode'.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   278
                                ]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   279
                            ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   280
                        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   281
                    ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   282
                ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   283
            ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   284
        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   285
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   286
    decomp := 
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   287
        [:baseCodePointArg :composeCodePointArg | 
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   288
            codePoint := baseCodePointArg. composeCodePoint := composeCodePointArg
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   289
        ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   290
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   291
    s := WriteStream on:(String uninitializedNew:aUnicodeString size).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   292
    aUnicodeString do:[:eachCharacter |
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   293
        |needExtra|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   294
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   295
        codePoint := eachCharacter codePoint.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   296
        needExtra := self decompositionOf: codePoint into:decomp.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   297
        gen value:codePoint.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   298
        needExtra ifTrue:[
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   299
            gen value:composeCodePoint
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   300
        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   301
    ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   302
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   303
    ^ s contents
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   304
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   305
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   306
     (self encodeString:'hello') asByteArray                             #[104 101 108 108 111]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   307
     (self encodeString:(Character value:16r40) asString) asByteArray    #[64]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   308
     (self encodeString:(Character value:16r7F) asString) asByteArray    #[127]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   309
     (self encodeString:(Character value:16r80) asString) asByteArray    #[194 128]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   310
     (self encodeString:(Character value:16rFF) asString) asByteArray    #[195 191]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   311
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   312
     (ISO10646_to_UTF8     new encodeString:'aäoöuü') asByteArray   
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   313
        -> #[97 195 164 111 195 182 117 195 188]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   314
     (ISO10646_to_UTF8_MAC new encodeString:'aäoöuü') asByteArray 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   315
        -> #[97 97 204 136 111 111 204 136 117 117 204 136]  
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   316
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   317
     ISO10646_to_UTF8_MAC new decodeString:
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   318
         (ISO10646_to_UTF8_MAC new encodeString:'Packages aus VSE für Smalltalk_X') asByteArray 
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   319
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   320
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   321
17497
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   322
!ISO10646_to_UTF8_MAC methodsFor:'queries'!
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   323
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   324
nameOfEncoding
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   325
    ^ #'utf8-mac'
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   326
! !
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   327
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   328
!ISO10646_to_UTF8_MAC class methodsFor:'documentation'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   329
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   330
version
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   331
    ^ '$Header: /cvs/stx/stx/libbasic/CharacterEncoderImplementations__ISO10646_to_UTF8_MAC.st,v 1.8 2015-02-27 18:53:22 cg Exp $'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   332
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   333
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   334
version_CVS
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   335
    ^ '$Header: /cvs/stx/stx/libbasic/CharacterEncoderImplementations__ISO10646_to_UTF8_MAC.st,v 1.8 2015-02-27 18:53:22 cg Exp $'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   336
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   337