CharacterEncoderImplementations__ISO10646_to_UTF8_MAC.st
author Claus Gittinger <cg@exept.de>
Tue, 09 Jul 2019 20:55:17 +0200
changeset 24417 03b083548da2
parent 22475 71b77246e002
permissions -rw-r--r--
#REFACTORING by exept class: Smalltalk class changed: #recursiveInstallAutoloadedClassesFrom:rememberIn:maxLevels:noAutoload:packageTop:showSplashInLevels: Transcript showCR:(... bindWith:...) -> Transcript showCR:... with:...
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     1
"{ Encoding: utf8 }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     2
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     3
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     4
 COPYRIGHT (c) 2015 by eXept Software AG
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     5
              All Rights Reserved
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     6
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     7
 This software is furnished under a license and may be used
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     8
 only in accordance with the terms of that license and with the
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     9
 inclusion of the above copyright notice.   This software may not
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    10
 be provided or otherwise made available to, or used by, any
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    11
 other person.  No title to or ownership of the software is
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    12
 hereby transferred.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    13
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    14
"{ Package: 'stx:libbasic' }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    15
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    16
"{ NameSpace: CharacterEncoderImplementations }"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    17
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    18
ISO10646_to_UTF8 subclass:#ISO10646_to_UTF8_MAC
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    19
	instanceVariableNames:''
22475
71b77246e002 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22414
diff changeset
    20
	classVariableNames:'DecomposeMap ComposeMap'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    21
	poolDictionaries:''
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    22
	category:'Collections-Text-Encodings'
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    23
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    24
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    25
!ISO10646_to_UTF8_MAC class methodsFor:'documentation'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    26
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    27
copyright
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    28
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    29
 COPYRIGHT (c) 2015 by eXept Software AG
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    30
              All Rights Reserved
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    31
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    32
 This software is furnished under a license and may be used
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    33
 only in accordance with the terms of that license and with the
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    34
 inclusion of the above copyright notice.   This software may not
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    35
 be provided or otherwise made available to, or used by, any
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    36
 other person.  No title to or ownership of the software is
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    37
 hereby transferred.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    38
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    39
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    40
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    41
documentation
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    42
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    43
    UTF-8 can encode some diacritical characters (umlauts) in multiple ways:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    44
        - either with a single uniode (e.g. ae -> รค -> &#228 -> C3 A4)
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    45
        - or as so called 'Normalization Form canonical Decomposition', i.e. as a regular 'a' followed by a
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    46
          combining diacritical mark (for example: acute).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    47
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    48
    MAC OSX needs the second form for its file names.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    49
    However, OSX does not decompose the ranges U+2000-U+2FFF, U+F900-U+FAFF and U+2F800-U+2FAFF.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    50
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    51
    This is a q&d hack, to at least support the first page (latin1) characters.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    52
    Will be enhanced for the 2nd and 3rd unicode page, when I find time.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    53
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    54
    [caveat:]
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    55
        only a small subset of multi-composes are supported yet (for example: trema plus acute)
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    56
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    57
    [author:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    58
        Claus Gittinger
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    59
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    60
    [instance variables:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    61
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    62
    [class variables:]
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    63
        ComposeMap DecomposeMap
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    64
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    65
    [see also:]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    66
        http://developer.apple.com/library/mac/#qa/qa2001/qa1173.html
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    67
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    68
"
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    69
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    70
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    71
!ISO10646_to_UTF8_MAC class methodsFor:'initialization'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    72
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    73
initializeDecomposeMap
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    74
    "the map which decomposes a diacritical character into its two components"
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    75
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    76
    DecomposeMap := Dictionary new.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    77
    ComposeMap := Dictionary new.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    78
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    79
    #(
17566
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    80
        "/ attention: the following strings contain non-latin characters
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    81
        "/ if you don't see them, change your font setting for a better font
a990c12c71c0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17565
diff changeset
    82
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    83
        (16r0300 "gravis"       'Aร€aร EรˆeรจIรŒiรฌoรฒOร’Uร™uรนNวธnวนWแบ€wแบYแปฒyแปณรœว›รผวœ')  
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    84
        (16r0301 "akut"         'AรaรกEร‰eรฉIรiรญOร“oรณUรšuรบyรฝYรCฤ†cฤ‡Nลƒnล„Rล”rล•Sลšsล›ZลนzลบGวดgวตร†วผรฆวฝร˜วพรธวฟMแธพmแธฟKแธฐkแธฑPแน”pแน•Wแบ‚wแบƒรœว—รผว˜') 
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    85
        (16r0302 "circonflex"   'Aร‚aรขEรŠeรชIรŽiรฎOร”oรดUร›uรปCฤˆcฤ‰GฤœgฤHฤคhฤฅJฤดjฤตSลœsลWลดwลตYลถyลทZแบzแบ‘')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    86
        (16r0303 "tilde"        'AรƒaรฃNร‘nรฑOร•oรตUลจuลฉYแปธyแปนEแบผeแบฝVแนผvแนฝ')
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    87
        (16r0304 "macron"       'Aฤ€aฤEฤ’eฤ“IฤชiฤซOลŒoลUลชuลซGแธ gแธกรœว•รผว–' ) 
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    88
        (16r0306 "breve"        'Aฤ‚aฤƒEฤ”eฤ•GฤžgฤŸIฤฌiฤญOลŽoลUลฌuลญ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    89
        (16r0307 "dot above"    'AศฆaศงOศฎoศฏCฤŠcฤ‹Eฤ–eฤ—Gฤ gฤกZลปzลผBแธ‚bแธƒDแธŠdแธ‹FแธžfแธŸHแธขhแธฃMแน€mแนNแน„nแน…Pแน–pแน—Rแน˜rแน™Sแน sแนกTแนชtแนซWแบ†wแบ‡XแบŠxแบ‹YแบŽyแบ' )
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
    90
        (16r0308 "umlaut/trema" 'Aร„aรคEร‹eรซOร–oรถUรœuรผIรiรฏyรฟYลธHแธฆhแธงXแบŒxแบtแบ—ร™ว›รนวœลชว•ลซว–รšว—รบว˜ว“ว™ว”วš')
17567
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    91
        (16r030A "ring"         'Aร…aรฅUลฎuลฏwแบ˜yแบ™')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    92
        (16r030B "dbl akut"     'Oลoล‘Uลฐuลฑ')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    93
        (16r030C "hatcheck"     'CฤŒcฤDฤŽEฤšeฤ›Nล‡nลˆRล˜rล™Sล sลกZลฝzลพAวaวŽIวiวOว‘oว’Uว“uว”GวฆgวงKวจkวฉรœว™รผวš')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    94
        (16r030F "dbl grave"    'Aศ€aศEศ„eศ…Iศˆiศ‰OศŒoศRศrศ‘Uศ”uศ•')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    95
        (16r0311 "inv. breve"   'Aศ‚aศƒEศ†eศ‡IศŠiศ‹OศŽoศRศ’rศ“Uศ–uศ—')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    96
        (16r0317 "acute. below" 'KฤถkฤทLฤปlฤผNล…nล†Rล–rล—Sศ˜sศ™Tศštศ›')
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    97
        (16r0327 "cedille"      'Cร‡cรงลžลŸTลขtลฃEศจeศฉDแธdแธ‘Hแธจhแธฉ')       
2d57395ef7e0 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17566
diff changeset
    98
        (16r0328 "ogonek"       'Aฤ„aฤ…Eฤ˜eฤ™IฤฎiฤฏOวชoวซUลฒuลณ')
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
    99
    ) do:[:eachPair |
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   100
        |composeCode mapping|
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   101
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   102
        composeCode := eachPair first.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   103
        mapping := eachPair second.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   104
        mapping pairWiseDo:[:baseChar :composedChar |
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   105
            "/ setup, so that we find
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   106
            "/    DecomposeMap at:"$ร  codePoint" 16rE0 put:#( "$a codePoint" 16r61 "greve codePoint" 16r0300).
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   107
            DecomposeMap 
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   108
                at:composedChar codePoint 
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   109
                put:(Array with:baseChar codePoint with:composeCode)
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   110
        ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   111
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   112
        ComposeMap at:composeCode put:mapping.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   113
    ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   114
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   115
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   116
!ISO10646_to_UTF8_MAC methodsFor:'encoding & decoding'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   117
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   118
compositionOf: baseChar with: diacriticalChar  to: outStream
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   119
    "compose two characters into one
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   120
     a + umlaut-diacritic-mark -> รค."
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   121
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   122
    |cp map i|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   123
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   124
    cp := diacriticalChar codePoint.
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   125
    (cp between:16r300 and:16r328) ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   126
        map := ComposeMap at:cp ifAbsent:nil.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   127
        map notNil ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   128
            "/ compose
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   129
            i := map indexOf: baseChar.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   130
            i ~~ 0 ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   131
                outStream nextPut: (map at:i+1).
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   132
                ^ self.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   133
            ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   134
        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   135
    ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   136
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   137
    "/ leave as is
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   138
    outStream nextPut: baseChar.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   139
    outStream nextPut: diacriticalChar.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   140
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   141
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   142
decodeString:aStringOrByteCollection
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   143
    "return a Unicode string from the passed in UTF-8-MAC encoded string.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   144
     This is UTF-8 with compose-characters decomposed 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   145
     (i.e. as separate codes, not as single combined characters).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   146
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   147
     For now, here is a limited version, which should work
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   148
     at least for most european countries...
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   149
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   150
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   151
    |s buff previous|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   152
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   153
    s := super decodeString:aStringOrByteCollection.
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   154
    (s contains:[:char | char codePoint between:16r0300 and:16r0328]) ifFalse:[^ s].
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   155
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   156
    ComposeMap isNil ifTrue:[
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   157
        self class initializeDecomposeMap
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   158
    ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   159
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   160
    buff := CharacterWriteStream on:''.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   161
    previous := nil.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   162
    s do:[:each |
17568
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   163
        (each codePoint between:16r0300 and:16r0328) ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   164
            previous isNil ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   165
                buff isEmpty ifTrue:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   166
                    "/ wrong - combiner not allowed here.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   167
                    buff nextPut:each.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   168
                ] ifFalse:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   169
                    "/ ouch - a multi-compose
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   170
                    previous := buff last.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   171
                    buff skip:-1.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   172
                    self compositionOf:previous with:each to:buff.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   173
                ].
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   174
            ] ifFalse:[
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   175
                self compositionOf:previous with:each to:buff.
e90410336cc2 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17567
diff changeset
   176
            ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   177
            previous := nil.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   178
        ] ifFalse:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   179
            previous notNil ifTrue:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   180
                buff nextPut:previous.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   181
            ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   182
            previous := each.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   183
        ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   184
    ].
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   185
    previous notNil ifTrue:[
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   186
        buff nextPut:previous.
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   187
    ].
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   188
    ^ buff contents.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   189
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   190
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   191
     (ISO10646_to_UTF8 new encodeString:'aรคoรถuรผ') asByteArray   
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   192
        -> #[97 195 164 111 195 182 117 195 188]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   193
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   194
     (ISO10646_to_UTF8 new decodeString:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   195
            (ISO10646_to_UTF8 new encodeString:'aรคoรถuรผ') asByteArray)    
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   196
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   197
    (ISO10646_to_UTF8_MAC new encodeString:'aรคoรถuรผ') asByteArray 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   198
        -> #[97 97 204 136 111 111 204 136 117 117 204 136]  
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   199
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   200
     (ISO10646_to_UTF8_MAC new decodeString:
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   201
            (ISO10646_to_UTF8_MAC new encodeString:'aรคoรถuรผ') asByteArray)    
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   202
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   203
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   204
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   205
decompositionOf: codePointIn into:outBlockWithTwoArgs
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   206
    "if required, decompose a diacritical character into a base character and a punctuation;
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   207
     eg. รค -> a + umlaut-diacritic-mark.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   208
     Pass both as args to the given block.
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   209
     For non diactit. chars, pass a nil diacrit-mark value.
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   210
     Return true, if a decomposition was done."
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   211
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   212
    |entry|
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   213
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   214
    codePointIn < 16rC0 ifTrue:[ ^ false ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   215
21593
bdcb1244b97d #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 21478
diff changeset
   216
    DecomposeMap isNil ifTrue:[
bdcb1244b97d #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 21478
diff changeset
   217
        self class initializeDecomposeMap
bdcb1244b97d #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 21478
diff changeset
   218
    ].
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   219
    entry := DecomposeMap at:codePointIn ifAbsent:nil.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   220
    entry isNil ifTrue:[ ^ false ].
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   221
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   222
    outBlockWithTwoArgs value:(entry at:1) value:(entry at:2).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   223
    ^ true
21593
bdcb1244b97d #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 21478
diff changeset
   224
bdcb1244b97d #BUGFIX by cg
Claus Gittinger <cg@exept.de>
parents: 21478
diff changeset
   225
    "Modified: / 28-02-2017 / 12:43:03 / cg"
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   226
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   227
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   228
encodeCharacter:aUnicodeCharacter on:aStream
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   229
    "return the UTF-8-MAC representation of a aUnicodeString.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   230
     This is UTF-8 with compose-characters decompose (i.e. as separate codes, not as
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   231
     single combined characters).
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   232
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   233
     For now, here is a limited version, which should work
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   234
     at least for most european countries...
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   235
    "
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   236
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   237
    |codePoint composeCodePoint needExtra|
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   238
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   239
    DecomposeMap isNil ifTrue:[
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   240
        self class initializeDecomposeMap
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   241
    ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   242
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   243
    codePoint := aUnicodeCharacter codePoint.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   244
    needExtra := self decompositionOf:codePoint into:[:baseCodePointArg :composeCodePointArg | 
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   245
            codePoint := baseCodePointArg. composeCodePoint := composeCodePointArg
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   246
        ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   247
    aStream nextPutUtf8:codePoint.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   248
    needExtra ifTrue:[
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   249
        aStream nextPutUtf8:composeCodePoint
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   250
    ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   251
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   252
    "Created: / 16-02-2017 / 17:45:18 / stefan"
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   253
!
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   254
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   255
encodeString:aUnicodeString
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   256
    "return the UTF-8-MAC representation of a aUnicodeString.
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   257
     This is UTF-8 with compose-characters decompose (i.e. as separate codes, not as
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   258
     single combined characters).
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   259
17564
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   260
     For now, here is a limited version, which should work
67ae75f28757 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17522
diff changeset
   261
     at least for most european countries...
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   262
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   263
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   264
    |s|
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   265
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   266
    s := WriteStream on:(String uninitializedNew:aUnicodeString size).
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   267
    self encodeString:aUnicodeString on:s.
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   268
    ^ s contents
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   269
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   270
    "
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   271
     (self encodeString:'hello') asByteArray                             #[104 101 108 108 111]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   272
     (self encodeString:(Character value:16r40) asString) asByteArray    #[64]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   273
     (self encodeString:(Character value:16r7F) asString) asByteArray    #[127]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   274
     (self encodeString:(Character value:16r80) asString) asByteArray    #[194 128]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   275
     (self encodeString:(Character value:16rFF) asString) asByteArray    #[195 191]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   276
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   277
     (ISO10646_to_UTF8     new encodeString:'aรคoรถuรผ') asByteArray   
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   278
        -> #[97 195 164 111 195 182 117 195 188]
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   279
     (ISO10646_to_UTF8_MAC new encodeString:'aรคoรถuรผ') asByteArray 
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   280
        -> #[97 97 204 136 111 111 204 136 117 117 204 136]  
17522
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   281
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   282
     ISO10646_to_UTF8_MAC new decodeString:
eea77b0b2c82 class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17497
diff changeset
   283
         (ISO10646_to_UTF8_MAC new encodeString:'Packages aus VSE fรผr Smalltalk_X') asByteArray 
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   284
    "
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   285
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   286
    "Modified (format): / 16-02-2017 / 17:36:14 / stefan"
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   287
!
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   288
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   289
encodeString:aUnicodeString on:aStream
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   290
    "return the UTF-8-MAC representation of a aUnicodeString.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   291
     This is UTF-8 with compose-characters decompose (i.e. as separate codes, not as
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   292
     single combined characters).
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   293
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   294
     For now, here is a limited version, which should work
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   295
     at least for most european countries...
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   296
    "
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   297
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   298
    |sz "{Class: SmallInteger}" decomposeBlock codePoint composeCodePoint needExtra|
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   299
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   300
    decomposeBlock := [:baseCodePointArg :composeCodePointArg | 
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   301
                          codePoint := baseCodePointArg. composeCodePoint := composeCodePointArg
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   302
                      ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   303
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   304
    sz := aUnicodeString size.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   305
    1 to:sz do:[:idx|
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   306
        codePoint := (aUnicodeString at:idx) codePoint.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   307
        needExtra := self decompositionOf:codePoint into:decomposeBlock.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   308
        aStream nextPutUtf8:codePoint.
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   309
        needExtra ifTrue:[
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   310
            aStream nextPutUtf8:composeCodePoint
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   311
        ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   312
    ].
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   313
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   314
    "Created: / 16-02-2017 / 17:33:04 / stefan"
22414
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   315
!
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   316
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   317
readNextCharacterFrom:aStream 
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   318
    |firstByte bytesToRead str|
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   319
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   320
    firstByte := aStream peek. 
22475
71b77246e002 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22414
diff changeset
   321
    firstByte isNil ifTrue:[
71b77246e002 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22414
diff changeset
   322
        ^ nil
71b77246e002 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22414
diff changeset
   323
    ].
22414
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   324
    firstByte := firstByte codePoint.
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   325
    bytesToRead := self class bytesToReadFor:firstByte.
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   326
    str := self decodeString:(aStream next:bytesToRead).
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   327
    str size ~~ 1 ifTrue:[
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   328
        DecodingError raiseRequestErrorString:' - bad UTF8_MAC encoding'.
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   329
    ].
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   330
    ^ str first
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   331
8a15c1e6c4a8 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 21593
diff changeset
   332
    "Created: / 10-01-2018 / 22:35:23 / stefan"
22475
71b77246e002 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 22414
diff changeset
   333
    "Modified: / 16-01-2018 / 16:53:59 / stefan"
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   334
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   335
17497
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   336
!ISO10646_to_UTF8_MAC methodsFor:'queries'!
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   337
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   338
nameOfEncoding
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   339
    ^ #'utf8-mac'
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   340
! !
36ab19b73c1f class: CharacterEncoderImplementations::ISO10646_to_UTF8_MAC
Claus Gittinger <cg@exept.de>
parents: 17490
diff changeset
   341
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   342
!ISO10646_to_UTF8_MAC class methodsFor:'documentation'!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   343
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   344
version
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   345
    ^ '$Header$'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   346
!
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   347
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   348
version_CVS
21478
2e63fbcbfa85 #TUNING by stefan
Stefan Vogel <sv@exept.de>
parents: 17568
diff changeset
   349
    ^ '$Header$'
17490
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   350
! !
dd28d3bda290 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   351