CharacterEncoderImplementations__ISO10646_to_UTF8_MAC.st
author Claus Gittinger <cg@exept.de>
Fri, 17 Feb 2017 10:25:31 +0100
changeset 21480 20b4ddb4ba7a
parent 21478 2e63fbcbfa85
child 21593 bdcb1244b97d
permissions -rw-r--r--
#FEATURE by cg class: Object added: #isURL
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     1
"{ Encoding: utf8 }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     2
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     3
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     4
 COPYRIGHT (c) 2015 by eXept Software AG
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     5
              All Rights Reserved
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     6
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     7
 This software is furnished under a license and may be used
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     8
 only in accordance with the terms of that license and with the
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     9
 inclusion of the above copyright notice.   This software may not
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    10
 be provided or otherwise made available to, or used by, any
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    11
 other person.  No title to or ownership of the software is
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    12
 hereby transferred.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    13
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    14
"{ Package: 'stx:libbasic' }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    15
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    16
"{ NameSpace: CharacterEncoderImplementations }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    17
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    18
ISO10646_to_UTF8 subclass:#ISO10646_to_UTF8_MAC
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    19
	instanceVariableNames:''
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    20
	classVariableNames:'AccentMap DecomposeMap ComposeMap'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    21
	poolDictionaries:''
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    22
	category:'Collections-Text-Encodings'
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    23
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    24
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    25
!ISO10646_to_UTF8_MAC class methodsFor:'documentation'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    26
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    27
copyright
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    28
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    29
 COPYRIGHT (c) 2015 by eXept Software AG
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    30
              All Rights Reserved
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    31
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    32
 This software is furnished under a license and may be used
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    33
 only in accordance with the terms of that license and with the
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    34
 inclusion of the above copyright notice.   This software may not
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    35
 be provided or otherwise made available to, or used by, any
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    36
 other person.  No title to or ownership of the software is
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    37
 hereby transferred.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    38
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    39
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    40
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    41
documentation
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    42
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    43
    UTF-8 can encode some diacritical characters (umlauts) in multiple ways:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    44
        - either with a single uniode (e.g. ae -> ä -> &#228 -> C3 A4)
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    45
        - or as so called 'Normalization Form canonical Decomposition', i.e. as a regular 'a' followed by a
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    46
          combining diacritical mark (for example: acute).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    47
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    48
    MAC OSX needs the second form for its file names.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    49
    However, OSX does not decompose the ranges U+2000-U+2FFF, U+F900-U+FAFF and U+2F800-U+2FAFF.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    50
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    51
    This is a q&d hack, to at least support the first page (latin1) characters.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    52
    Will be enhanced for the 2nd and 3rd unicode page, when I find time.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    53
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    54
    [caveat:]
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    55
        only a small subset of multi-composes are supported yet (for example: trema plus acute)
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    56
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    57
    [author:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    58
        Claus Gittinger
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    59
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    60
    [instance variables:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    61
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    62
    [class variables:]
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    63
        ComposeMap DecomposeMap
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    64
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    65
    [see also:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    66
        http://developer.apple.com/library/mac/#qa/qa2001/qa1173.html
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    67
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    68
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    69
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    70
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    71
!ISO10646_to_UTF8_MAC class methodsFor:'initialization'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    72
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    73
initializeDecomposeMap
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    74
    "the map which decomposes a diacritical character into its two components"
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    75
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    76
    DecomposeMap := Dictionary new.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    77
    ComposeMap := Dictionary new.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    78
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    79
    #(
17566
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    80
        "/ attention: the following strings contain non-latin characters
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    81
        "/ if you don't see them, change your font setting for a better font
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    82
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    83
        (16r0300 "gravis"       'AÀaàEÈeèIÌiìoòOÒUÙuùNǸnǹWẀwẁYỲyỳÜǛüǜ')  
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    84
        (16r0301 "akut"         'AÁaáEÉeéIÍiíOÓoóUÚuúyýYÝCĆcćNŃnńRŔrŕSŚsśZŹzźGǴgǵÆǼæǽØǾøǿMḾmḿKḰkḱPṔpṕWẂwẃÜǗüǘ') 
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    85
        (16r0302 "circonflex"   'AÂaâEÊeêIÎiîOÔoôUÛuûCĈcĉGĜgĝHĤhĥJĴjĵSŜsŝWŴwŵYŶyŷZẐzẑ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    86
        (16r0303 "tilde"        'AÃaãNÑnñOÕoõUŨuũYỸyỹEẼeẽVṼvṽ')
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    87
        (16r0304 "macron"       'AĀaāEĒeēIĪiīOŌoōUŪuūGḠgḡÜǕüǖ' ) 
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    88
        (16r0306 "breve"        'AĂaăEĔeĕGĞgğIĬiĭOŎoŏUŬuŭ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    89
        (16r0307 "dot above"    'AȦaȧOȮoȯCĊcċEĖeėGĠgġZŻzżBḂbḃDḊdḋFḞfḟHḢhḣMṀmṁNṄnṅPṖpṗRṘrṙSṠsṡTṪtṫWẆwẇXẊxẋYẎyẏ' )
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    90
        (16r0308 "umlaut/trema" 'AÄaäEËeëOÖoöUÜuüIÏiïyÿYŸHḦhḧXẌxẍtẗÙǛùǜŪǕūǖÚǗúǘǓǙǔǚ')
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    91
        (16r030A "ring"         'AÅaåUŮuůwẘyẙ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    92
        (16r030B "dbl akut"     'OŐoőUŰuű')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    93
        (16r030C "hatcheck"     'CČcčDĎEĚeěNŇnňRŘrřSŠsšZŽzžAǍaǎIǏiǐOǑoǒUǓuǔGǦgǧKǨkǩÜǙüǚ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    94
        (16r030F "dbl grave"    'AȀaȁEȄeȅIȈiȉOȌoȍRȐrȑUȔuȕ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    95
        (16r0311 "inv. breve"   'AȂaȃEȆeȇIȊiȋOȎoȏRȒrȓUȖuȗ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    96
        (16r0317 "acute. below" 'KĶkķLĻlļNŅnņRŖrŗSȘsșTȚtț')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    97
        (16r0327 "cedille"      'CÇc窺TŢtţEȨeȩDḐdḑHḨhḩ')       
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    98
        (16r0328 "ogonek"       'AĄaąEĘeęIĮiįOǪoǫUŲuų')
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    99
    ) do:[:eachPair |
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   100
        |composeCode mapping|
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   101
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   102
        composeCode := eachPair first.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   103
        mapping := eachPair second.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   104
        mapping pairWiseDo:[:baseChar :composedChar |
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   105
            "/ setup, so that we find
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   106
            "/    DecomposeMap at:"$à codePoint" 16rE0 put:#( "$a codePoint" 16r61 "greve codePoint" 16r0300).
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   107
            DecomposeMap 
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   108
                at:composedChar codePoint 
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   109
                put:(Array with:baseChar codePoint with:composeCode)
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   110
        ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   111
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   112
        ComposeMap at:composeCode put:mapping.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   113
    ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   114
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   115
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   116
!ISO10646_to_UTF8_MAC methodsFor:'encoding & decoding'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   117
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   118
compositionOf: baseChar with: diacriticalChar  to: outStream
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   119
    "compose two characters into one
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   120
     a + umlaut-diacritic-mark -> ä."
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   121
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   122
    |cp map i|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   123
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   124
    cp := diacriticalChar codePoint.
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   125
    (cp between:16r300 and:16r328) ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   126
        map := ComposeMap at:cp ifAbsent:nil.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   127
        map notNil ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   128
            "/ compose
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   129
            i := map indexOf: baseChar.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   130
            i ~~ 0 ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   131
                outStream nextPut: (map at:i+1).
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   132
                ^ self.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   133
            ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   134
        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   135
    ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   136
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   137
    "/ leave as is
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   138
    outStream nextPut: baseChar.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   139
    outStream nextPut: diacriticalChar.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   140
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   141
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   142
decodeString:aStringOrByteCollection
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   143
    "return a Unicode string from the passed in UTF-8-MAC encoded string.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   144
     This is UTF-8 with compose-characters decomposed 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   145
     (i.e. as separate codes, not as single combined characters).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   146
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   147
     For now, here is a limited version, which should work
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   148
     at least for most european countries...
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   149
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   150
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   151
    |s buff previous|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   152
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   153
    s := super decodeString:aStringOrByteCollection.
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   154
    (s contains:[:char | char codePoint between:16r0300 and:16r0328]) ifFalse:[^ s].
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   155
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   156
    ComposeMap isNil ifTrue:[
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   157
        self class initializeDecomposeMap
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   158
    ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   159
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   160
    buff := CharacterWriteStream on:''.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   161
    previous := nil.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   162
    s do:[:each |
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   163
        (each codePoint between:16r0300 and:16r0328) ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   164
            previous isNil ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   165
                buff isEmpty ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   166
                    "/ wrong - combiner not allowed here.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   167
                    buff nextPut:each.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   168
                ] ifFalse:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   169
                    "/ ouch - a multi-compose
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   170
                    previous := buff last.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   171
                    buff skip:-1.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   172
                    self compositionOf:previous with:each to:buff.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   173
                ].
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   174
            ] ifFalse:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   175
                self compositionOf:previous with:each to:buff.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   176
            ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   177
            previous := nil.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   178
        ] ifFalse:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   179
            previous notNil ifTrue:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   180
                buff nextPut:previous.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   181
            ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   182
            previous := each.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   183
        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   184
    ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   185
    previous notNil ifTrue:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   186
        buff nextPut:previous.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   187
    ].
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   188
    ^ buff contents.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   189
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   190
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   191
     (ISO10646_to_UTF8 new encodeString:'aäoöuü') asByteArray   
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   192
        -> #[97 195 164 111 195 182 117 195 188]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   193
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   194
     (ISO10646_to_UTF8 new decodeString:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   195
            (ISO10646_to_UTF8 new encodeString:'aäoöuü') asByteArray)    
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   196
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   197
    (ISO10646_to_UTF8_MAC new encodeString:'aäoöuü') asByteArray 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   198
        -> #[97 97 204 136 111 111 204 136 117 117 204 136]  
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   199
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   200
     (ISO10646_to_UTF8_MAC new decodeString:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   201
            (ISO10646_to_UTF8_MAC new encodeString:'aäoöuü') asByteArray)    
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   202
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   203
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   204
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   205
decompositionOf: codePointIn into:outBlockWithTwoArgs
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   206
    "if required, decompose a diacritical character into a base character and a punctuation;
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   207
     eg. ä -> a + umlaut-diacritic-mark.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   208
     Pass both as args to the given block.
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   209
     For non diactit. chars, pass a nil diacrit-mark value.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   210
     Return true, if a decomposition was done."
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   211
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   212
    |entry|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   213
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   214
    codePointIn < 16rC0 ifTrue:[ ^ false ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   215
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   216
    entry := DecomposeMap at:codePointIn ifAbsent:nil.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   217
    entry isNil ifTrue:[ ^ false ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   218
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   219
    outBlockWithTwoArgs value:(entry at:1) value:(entry at:2).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   220
    ^ true
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   221
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   222
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   223
encodeCharacter:aUnicodeCharacter on:aStream
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   224
    "return the UTF-8-MAC representation of a aUnicodeString.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   225
     This is UTF-8 with compose-characters decompose (i.e. as separate codes, not as
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   226
     single combined characters).
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   227
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   228
     For now, here is a limited version, which should work
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   229
     at least for most european countries...
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   230
    "
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   231
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   232
    |codePoint composeCodePoint needExtra|
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   233
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   234
    DecomposeMap isNil ifTrue:[
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   235
        self class initializeDecomposeMap
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   236
    ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   237
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   238
    codePoint := aUnicodeCharacter codePoint.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   239
    needExtra := self decompositionOf:codePoint into:[:baseCodePointArg :composeCodePointArg | 
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   240
            codePoint := baseCodePointArg. composeCodePoint := composeCodePointArg
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   241
        ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   242
    aStream nextPutUtf8:codePoint.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   243
    needExtra ifTrue:[
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   244
        aStream nextPutUtf8:composeCodePoint
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   245
    ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   246
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   247
    "Created: / 16-02-2017 / 17:45:18 / stefan"
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   248
!
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   249
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   250
encodeString:aUnicodeString
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   251
    "return the UTF-8-MAC representation of a aUnicodeString.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   252
     This is UTF-8 with compose-characters decompose (i.e. as separate codes, not as
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   253
     single combined characters).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   254
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   255
     For now, here is a limited version, which should work
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   256
     at least for most european countries...
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   257
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   258
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   259
    |s|
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   260
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   261
    s := WriteStream on:(String uninitializedNew:aUnicodeString size).
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   262
    self encodeString:aUnicodeString on:s.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   263
    ^ s contents
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   264
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   265
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   266
     (self encodeString:'hello') asByteArray                             #[104 101 108 108 111]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   267
     (self encodeString:(Character value:16r40) asString) asByteArray    #[64]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   268
     (self encodeString:(Character value:16r7F) asString) asByteArray    #[127]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   269
     (self encodeString:(Character value:16r80) asString) asByteArray    #[194 128]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   270
     (self encodeString:(Character value:16rFF) asString) asByteArray    #[195 191]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   271
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   272
     (ISO10646_to_UTF8     new encodeString:'aäoöuü') asByteArray   
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   273
        -> #[97 195 164 111 195 182 117 195 188]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   274
     (ISO10646_to_UTF8_MAC new encodeString:'aäoöuü') asByteArray 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   275
        -> #[97 97 204 136 111 111 204 136 117 117 204 136]  
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   276
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   277
     ISO10646_to_UTF8_MAC new decodeString:
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   278
         (ISO10646_to_UTF8_MAC new encodeString:'Packages aus VSE für Smalltalk_X') asByteArray 
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   279
    "
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   280
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   281
    "Modified (format): / 16-02-2017 / 17:36:14 / stefan"
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   282
!
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   283
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   284
encodeString:aUnicodeString on:aStream
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   285
    "return the UTF-8-MAC representation of a aUnicodeString.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   286
     This is UTF-8 with compose-characters decompose (i.e. as separate codes, not as
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   287
     single combined characters).
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   288
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   289
     For now, here is a limited version, which should work
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   290
     at least for most european countries...
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   291
    "
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   292
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   293
    |sz "{Class: SmallInteger}" decomposeBlock codePoint composeCodePoint needExtra|
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   294
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   295
    decomposeBlock := [:baseCodePointArg :composeCodePointArg | 
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   296
                          codePoint := baseCodePointArg. composeCodePoint := composeCodePointArg
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   297
                      ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   298
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   299
    sz := aUnicodeString size.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   300
    1 to:sz do:[:idx|
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   301
        codePoint := (aUnicodeString at:idx) codePoint.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   302
        needExtra := self decompositionOf:codePoint into:decomposeBlock.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   303
        aStream nextPutUtf8:codePoint.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   304
        needExtra ifTrue:[
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   305
            aStream nextPutUtf8:composeCodePoint
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   306
        ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   307
    ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   308
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   309
    "Created: / 16-02-2017 / 17:33:04 / stefan"
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   310
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   311
17497
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   312
!ISO10646_to_UTF8_MAC methodsFor:'queries'!
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   313
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   314
nameOfEncoding
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   315
    ^ #'utf8-mac'
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   316
! !
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   317
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   318
!ISO10646_to_UTF8_MAC class methodsFor:'documentation'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   319
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   320
version
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   321
    ^ '$Header$'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   322
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   323
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   324
version_CVS
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   325
    ^ '$Header$'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   326
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   327