CharacterEncoder.st
author Claus Gittinger <cg@exept.de>
Wed, 05 Feb 2014 18:19:50 +0100
changeset 15966 72f3e3a9ba29
parent 15609 36dd250b19f4
child 16054 171c7f8b4547
permissions -rw-r--r--
class: CharacterEncoder changed: #initialize merged in jv's changes
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8048
293c8178c6eb utf8 errors
Claus Gittinger <cg@exept.de>
parents: 8033
diff changeset
     1
"
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
     2
 COPYRIGHT (c) 2004 by eXept Software AG
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
     3
              All Rights Reserved
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
     4
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
     5
 This software is furnished under a license and may be used
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
     6
 only in accordance with the terms of that license and with the
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
     7
 inclusion of the above copyright notice.   This software may not
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
     8
 be provided or otherwise made available to, or used by, any
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
     9
 other person.  No title to or ownership of the software is
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    10
 hereby transferred.
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    11
"
8114
05274a80fcc4 separated implementation into dynamically (lazy) loaded classes
Claus Gittinger <cg@exept.de>
parents: 8105
diff changeset
    12
"{ Package: 'stx:libbasic' }"
05274a80fcc4 separated implementation into dynamically (lazy) loaded classes
Claus Gittinger <cg@exept.de>
parents: 8105
diff changeset
    13
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
    14
Object subclass:#CharacterEncoder
14523
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    15
	instanceVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    16
	classVariableNames:'EncoderClassesByName EncodersByName CachedEncoders LastEncoder
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    17
		AccessLock NullEncoderInstance Jis7KanjiEscapeSequence
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    18
		Jis7RomanEscapeSequence JisISO2022EscapeSequence
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    19
		Jis7KanjiOldEscapeSequence'
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    20
	poolDictionaries:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    21
	category:'Collections-Text-Encodings'
7969
1c252e9cf79c *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7967
diff changeset
    22
!
1c252e9cf79c *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7967
diff changeset
    23
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
    24
CharacterEncoder subclass:#CompoundEncoder
14523
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    25
	instanceVariableNames:'decoder encoder'
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    26
	classVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    27
	poolDictionaries:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    28
	privateIn:CharacterEncoder
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
    29
!
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
    30
7932
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
    31
CharacterEncoder subclass:#DefaultEncoder
14523
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    32
	instanceVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    33
	classVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    34
	poolDictionaries:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    35
	privateIn:CharacterEncoder
7932
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
    36
!
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
    37
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
    38
CharacterEncoder subclass:#InverseEncoder
14523
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    39
	instanceVariableNames:'decoder'
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    40
	classVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    41
	poolDictionaries:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    42
	privateIn:CharacterEncoder
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
    43
!
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
    44
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    45
CharacterEncoder subclass:#NullEncoder
14523
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    46
	instanceVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    47
	classVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    48
	poolDictionaries:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    49
	privateIn:CharacterEncoder
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
    50
!
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
    51
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    52
CharacterEncoder subclass:#OtherEncoding
14523
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    53
	instanceVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    54
	classVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    55
	poolDictionaries:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    56
	privateIn:CharacterEncoder
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
    57
!
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
    58
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
    59
CharacterEncoder subclass:#TwoStepEncoder
14523
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    60
	instanceVariableNames:'encoder1 encoder2'
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    61
	classVariableNames:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    62
	poolDictionaries:''
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
    63
	privateIn:CharacterEncoder
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
    64
!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
    65
7893
80df105ac17c checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7892
diff changeset
    66
!CharacterEncoder class methodsFor:'documentation'!
80df105ac17c checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7892
diff changeset
    67
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    68
copyright
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    69
"
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    70
 COPYRIGHT (c) 2004 by eXept Software AG
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
    71
              All Rights Reserved
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    72
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    73
 This software is furnished under a license and may be used
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    74
 only in accordance with the terms of that license and with the
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    75
 inclusion of the above copyright notice.   This software may not
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    76
 be provided or otherwise made available to, or used by, any
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    77
 other person.  No title to or ownership of the software is
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    78
 hereby transferred.
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    79
"
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    80
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    81
7893
80df105ac17c checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7892
diff changeset
    82
documentation
80df105ac17c checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7892
diff changeset
    83
"
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    84
    unfinished code - please read howToAddMoreCoders.
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    85
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    86
    Character mappings are based on information in character maps found at either:
8226
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    87
        http://std.dkuug.dk/i18n/charmaps
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    88
    or:
8226
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    89
        http://www.unicode.org/Public/MAPPINGS
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    90
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
    91
    No Warranty.
8226
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    92
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    93
    All the ISO 8859 codesets include ASCII as a proper codeset within them: 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    94
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    95
    ISO 8859-1: Latin 1 - Western European Languages. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    96
    ISO 8859-2: Latin 2 - Eastern European Languages. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    97
    ISO 8859-3: Latin 3 - Afrikaans, Catalan, Dutch, English, Esperanto, German, 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    98
                          Italian, Maltese, Spanish and Turkish. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
    99
    ISO 8859-4: Latin 4 - Danish, English, Estonian, Finnish, German, Greenlandic, Lappish and Latvian. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
   100
    ISO 8859-5: Latin/Cyrillic - Bulgarian, Byelorussian, English, Macedonian, Russian, Serbo-Croat and Ukranian. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
   101
    ISO 8859-6: Latin/Arabic - Arabic. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
   102
    ISO 8859-7: Latin/Greek - Greek. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
   103
    ISO 8859-8: Latin/Hebrew - Hebrew. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
   104
    ISO 8859-9: Latin 5 - Danish, Dutch, English, Finnish, French, German, Irish, Italian, 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
   105
                          Norwegian, Portuguese, Spanish, Swedish and Turkish. 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
   106
    ISO 8859-10: Latin 6 - Danish, English, Estonian, Finnish, German, Greenlandic, Icelandic, 
81d95cffe5be *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8214
diff changeset
   107
                          Sami (Lappish), Latvian, Lithuanian, Norwegian, Faroese and Swedish.
8810
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
   108
    [author:]
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
   109
        Claus Gittinger
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   110
"
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   111
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   112
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   113
examples
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   114
"
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   115
                                                                        [exBegin]                                                     
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   116
    |s1 s2|
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   117
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   118
    s1 := 'hello'.
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   119
    s2 := CharacterEncoder encodeString:s1 from:#'iso8859-1' into:#'unicode'.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   120
    s2       
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   121
                                                                        [exEnd]                                                     
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   122
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   123
                                                                        [exBegin]                                                     
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   124
    |s1 s2|
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   125
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   126
    s1 := 'hello'.
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   127
    s2 := CharacterEncoder encodeString:s1 from:#'iso8859-1' into:#'iso8859-7'.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   128
    s2      
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   129
                                                                        [exEnd]                                                     
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   130
"
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   131
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   132
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   133
howToAddMoreCoders
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   134
"
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   135
    Coders can be hand-written or automagically generated via a mapping table.
7932
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   136
    Examples for hand-written coders are UTF8_to_ISO10464 or JIS0208_to_JIS7.
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   137
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   138
    The table driven encode/decode methods can be generated from a character mapping document
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   139
    as found on the unicode consortium host
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   140
        (for example: 'http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT')
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   141
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   142
    or from the i18n character maps:
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   143
        (for example: 'http://std.dkuug.dk/i18n/charmaps/ISO-8859-1
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   144
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   145
    In order to add another coder (for example: for EBCDIC or ms-codePage 278),
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   146
    perform the following steps:
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   147
        - create a private subclass of CharacterEncoder named (for example) CP267.
8114
05274a80fcc4 separated implementation into dynamically (lazy) loaded classes
Claus Gittinger <cg@exept.de>
parents: 8105
diff changeset
   148
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   149
        - create a public subclass of CharacterEncoderImplementations::CharacterEncoderImplementation named (for example) CharacterEncoderImplementations::CP267.
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   150
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   151
        - define the mappingURL1_relativeName (if the table is found on 'www.unicode.org')
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   152
          or the mappingURL2_relativeName (if it is found on 'std.dkuug.dk') method, which
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   153
          should return the name of the tables file, relative to the top directory there
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   154
          (which is '.../Public/MAPPINGS' on www.unicode.org and '.../i18n/charmaops' on 'std.dkuug.dk'.
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   155
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   156
          In this example, the table from 'std.dkuug.dk' is used, and named 'EBCDIC-CP-FI' there.
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   157
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   158
        - generate code by evaluating:
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   159
            CharacterEncoder::CP267 generateCode
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   160
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   161
    Thats all !!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   162
7909
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   163
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   164
    The existing code was generated by:
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   165
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   166
        CharacterEncoder::SingleByteEncoder subclassesDo:[:cls | Transcript showCR:cls name. cls flushCode; generateCode ]
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   167
        CharacterEncoder::SingleByteEncoder subclassesDo:[:cls | cls allSubclassesDo:[:sub | Transcript showCR:sub name. sub flushCode; generateSubclassCode]]
7909
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   168
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   169
    or individually:
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   170
        CharacterEncoder::ASCII flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   171
        CharacterEncoder::ISO8859_1 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   172
        CharacterEncoder::ISO8859_2 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   173
        CharacterEncoder::ISO8859_3 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   174
        CharacterEncoder::ISO8859_4 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   175
        CharacterEncoder::ISO8859_5 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   176
        CharacterEncoder::ISO8859_6 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   177
        CharacterEncoder::ISO8859_7 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   178
        CharacterEncoder::ISO8859_8 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   179
        CharacterEncoder::ISO8859_9 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   180
        CharacterEncoder::ISO8859_10 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   181
        CharacterEncoder::ISO8859_11 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   182
        CharacterEncoder::ISO8859_13 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   183
        CharacterEncoder::ISO8859_14 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   184
        CharacterEncoder::ISO8859_15 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   185
        CharacterEncoder::ISO8859_16 flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   186
        CharacterEncoder::KOI8_R flushCode; generateCode.
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   187
        CharacterEncoder::GSM0338 flushCode; generateCode.
7909
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   188
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   189
        CharacterEncoder::KOI8_U flushCode; generateSubclassCode.
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   190
9143
28eeea2f0112 comments
Claus Gittinger <cg@exept.de>
parents: 9064
diff changeset
   191
        CharacterEncoder::JIS0208 flushCode; generateCode.
13072
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   192
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   193
    Please check if your encoder tables are complete; for example, with:
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   194
        0 to:255 do:[:ebc |
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   195
            |asc ebc2|
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   196
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   197
            asc := CharacterEncoderImplementations::EBCDIC new decode:ebc.
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   198
            asc notNil ifTrue:[
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   199
               ebc2 := CharacterEncoderImplementations::EBCDIC new encode:asc.
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   200
               self assert:(ebc2 = ebc)
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   201
            ].
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   202
        ].
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   203
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   204
        0 to:255 do:[:asc |
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   205
            |ebc asc2|
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   206
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   207
            ebc := CharacterEncoderImplementations::EBCDIC new encode:asc.
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   208
            ebc notNil ifTrue:[
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   209
               asc2 := CharacterEncoderImplementations::EBCDIC new decode:ebc.
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   210
               self assert:(asc2 = asc)
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   211
            ].
e189e07c16aa changed: #howToAddMoreCoders
Claus Gittinger <cg@exept.de>
parents: 13063
diff changeset
   212
        ].
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   213
"
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   214
! !
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   215
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   216
!CharacterEncoder class methodsFor:'instance creation'!
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   217
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   218
encoderFor:encodingNameSymbol
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   219
    "given the name of an encoding, return an encoder-instance which can map these from/into unicode."
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   220
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   221
    ^ self
8156
bd5169c15b31 *** empty log message ***
ca
parents: 8155
diff changeset
   222
        encoderFor:encodingNameSymbol 
bd5169c15b31 *** empty log message ***
ca
parents: 8155
diff changeset
   223
        ifAbsent:[
bd5169c15b31 *** empty log message ***
ca
parents: 8155
diff changeset
   224
            "/ proceed to ignore this error in the future.    
8352
20d2476f538e add nullEncoder BEFORE raising an error
Claus Gittinger <cg@exept.de>
parents: 8262
diff changeset
   225
20d2476f538e add nullEncoder BEFORE raising an error
Claus Gittinger <cg@exept.de>
parents: 8262
diff changeset
   226
            (EncodersByName at:#unicode) at:encodingNameSymbol put:NullEncoderInstance. 
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   227
            (EncoderClassesByName at:#unicode) at:encodingNameSymbol put:NullEncoder.    
8352
20d2476f538e add nullEncoder BEFORE raising an error
Claus Gittinger <cg@exept.de>
parents: 8262
diff changeset
   228
8388
b5cf7abdfe64 no encoder: send a message to stdError instead of entering
Claus Gittinger <cg@exept.de>
parents: 8352
diff changeset
   229
            "/ self error:'no encoder for ' , encodingNameSymbol mayProceed:true.
13325
5dba5cb58029 changed: #encoderFor:
Claus Gittinger <cg@exept.de>
parents: 13072
diff changeset
   230
            ('CharacterEncoder [warning]: no encoder for ' , encodingNameSymbol) infoPrintCR.
8388
b5cf7abdfe64 no encoder: send a message to stdError instead of entering
Claus Gittinger <cg@exept.de>
parents: 8352
diff changeset
   231
            
8156
bd5169c15b31 *** empty log message ***
ca
parents: 8155
diff changeset
   232
            NullEncoderInstance
bd5169c15b31 *** empty log message ***
ca
parents: 8155
diff changeset
   233
        ]
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   234
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   235
    "
8388
b5cf7abdfe64 no encoder: send a message to stdError instead of entering
Claus Gittinger <cg@exept.de>
parents: 8352
diff changeset
   236
     CharacterEncoder encoderFor:#'blabla2'       
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   237
     CharacterEncoder encoderFor:#'latin1'       
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   238
     self encoderFor:#'arabic'       
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   239
     self encoderFor:#'ms-arabic'       
8814
501f04d1f533 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8810
diff changeset
   240
     self encoderFor:#'cp1250'       
501f04d1f533 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8810
diff changeset
   241
     self encoderFor:#'cp1251'       
501f04d1f533 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8810
diff changeset
   242
     self encoderFor:#'cp1252'       
501f04d1f533 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8810
diff changeset
   243
     self encoderFor:#'cp1253'       
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   244
     self encoderFor:#'iso8859-5'    
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   245
     self encoderFor:#'koi8-r'      
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   246
     self encoderFor:#'koi8-u'      
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   247
     self encoderFor:#'jis0208'      
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   248
     self encoderFor:#'jis7'      
8087
0a2ee76bcf55 last version before separating into extra classes
Claus Gittinger <cg@exept.de>
parents: 8062
diff changeset
   249
     self encoderFor:#'utf8'      
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   250
     (self encoderFor:#'utf16le') encodeString:'hello'      
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   251
     (self encoderFor:#'utf16le') encode:5    
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   252
     (self encoderFor:#'utf16be') encodeString:'hello'      
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   253
     (self encoderFor:#'utf16be') encode:5      
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   254
     (self encoderFor:#'utf32le') encodeString:'hello'      
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   255
     (self encoderFor:#'utf32be') encodeString:'hello'      
10111
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
   256
     self encoderFor:#'sgml'      
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
   257
     self encoderFor:#'java'      
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   258
    "
10111
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
   259
14207
f80306416305 comment/format in:
Claus Gittinger <cg@exept.de>
parents: 14206
diff changeset
   260
    "Modified: / 12-07-2012 / 19:35:43 / cg"
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   261
!
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   262
8168
8f8da8bb046d *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8156
diff changeset
   263
encoderFor:encodingNameSymbolArg ifAbsent:exceptionValue
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   264
    "given the name of an encoding, return an encoder-instance which can map these from/into unicode."
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   265
8168
8f8da8bb046d *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8156
diff changeset
   266
    |encodingNameSymbol enc clsName cls lcName name unicodeEncoders unicodeEncoderClasses|
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   267
8168
8f8da8bb046d *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8156
diff changeset
   268
    encodingNameSymbol := encodingNameSymbolArg.
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   269
    encodingNameSymbol isNil ifTrue:[ ^ NullEncoderInstance].
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   270
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   271
    encodingNameSymbol == #'iso10646-1' ifTrue:[ encodingNameSymbol := #unicode].
8168
8f8da8bb046d *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8156
diff changeset
   272
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   273
    lcName := encodingNameSymbol asLowercase.
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   274
    name := lcName asSymbolIfInterned.
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   275
    name isNil ifTrue:[name := lcName].
8052
4ca96b117b21 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 8048
diff changeset
   276
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   277
    name includesMatchCharacters ifTrue:[
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   278
        AccessLock critical:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   279
            unicodeEncoders := EncodersByName at:#unicode ifAbsent:nil.
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   280
        ].
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   281
        unicodeEncoders notNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   282
            unicodeEncoders keysAndValuesDo:[:eachEncodingAlias :eachEncoderInstance |
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   283
                (name matches:eachEncodingAlias) ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   284
                    ^ eachEncoderInstance.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   285
                ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   286
            ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   287
        ].
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   288
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   289
        AccessLock critical:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   290
            unicodeEncoderClasses := EncoderClassesByName at:#unicode.
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   291
        ].
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   292
        unicodeEncoderClasses notNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   293
            unicodeEncoderClasses keysAndValuesDo:[:eachEncodingAlias :eachEncoderClassOrName |
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   294
                (name matches:eachEncodingAlias) ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   295
                    eachEncoderClassOrName isBehavior ifTrue:[
8194
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   296
                        cls := eachEncoderClassOrName
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   297
                    ] ifFalse:[
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   298
                        cls := CharacterEncoderImplementations at:eachEncoderClassOrName.
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   299
                    ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   300
                    cls notNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   301
                        ^ cls new.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   302
                    ]
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   303
                ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   304
            ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   305
        ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   306
        ^ exceptionValue value
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   307
    ].
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   308
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   309
    AccessLock critical:[
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   310
        unicodeEncoders := EncodersByName at:#unicode ifAbsent:nil.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   311
        unicodeEncoders isNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   312
            EncodersByName at:#unicode put:(unicodeEncoders := Dictionary new).
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   313
        ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   314
        enc := unicodeEncoders at:name ifAbsent:nil.
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   315
    ].
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   316
    enc isNil ifTrue:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   317
        AccessLock critical:[
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   318
            unicodeEncoderClasses := EncoderClassesByName at:#unicode ifAbsent:nil.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   319
            unicodeEncoderClasses isNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   320
                EncoderClassesByName at:#unicode put:(unicodeEncoderClasses := Dictionary new).
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   321
            ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   322
            clsName := unicodeEncoderClasses at:name ifAbsent:nil.
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   323
        ].
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   324
        clsName notNil ifTrue:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   325
            clsName isBehavior ifTrue:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   326
                cls := clsName
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   327
            ] ifFalse:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   328
                cls := CharacterEncoderImplementations at:clsName.
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   329
            ].
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   330
            cls notNil ifTrue:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   331
                enc := cls new.
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   332
                AccessLock critical:[
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   333
                    unicodeEncoders at:name put:enc.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   334
                ]
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   335
            ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   336
        ].
7973
6dea491d56f7 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7972
diff changeset
   337
    ].
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   338
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   339
    enc notNil ifTrue:[
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   340
        ^ enc 
7973
6dea491d56f7 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7972
diff changeset
   341
    ].
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   342
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   343
    "/ no direct encoder from unicode->name
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   344
    "/ search for unicode->any and: any->name
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   345
    AccessLock critical:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   346
        unicodeEncoderClasses := EncoderClassesByName at:#unicode ifAbsent:nil.
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   347
    ].
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   348
    unicodeEncoderClasses keysAndValuesDo:[:eachEncodingAlias :eachEncoderClass |
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   349
        |dict2 enc1 enc2|
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   350
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   351
        AccessLock critical:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   352
            dict2 := EncoderClassesByName at:eachEncodingAlias ifAbsent:nil.
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   353
        ].
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   354
        dict2 notNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   355
            clsName := dict2 at:name ifAbsent:nil.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   356
            clsName notNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   357
                clsName isBehavior ifTrue:[
8194
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   358
                    cls := clsName
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   359
                ] ifFalse:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   360
                    cls := CharacterEncoderImplementations at:clsName.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   361
                ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   362
                cls notNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   363
                    enc2 := cls new.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   364
                    enc1 := self encoderFor:eachEncodingAlias.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   365
                    (enc1 notNil and:[enc2 notNil]) ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   366
                        enc := TwoStepEncoder new encoder1:enc1 encoder2:enc2.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   367
                        AccessLock critical:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   368
                            unicodeEncoders at:name put:enc.    
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   369
                        ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   370
                        ^ enc.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   371
                    ]
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   372
                ]
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   373
            ]
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   374
        ].
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   375
    ].
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   376
8194
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   377
    EncoderClassesByName keysAndValuesDo:[:encoding1 :dict1 |
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   378
        dict1 keysAndValuesDo:[:encoding2 :clsName1|
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   379
            |clsName2 cls1 cls2 dict2 enc1 enc2|
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   380
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   381
            encoding2 = encodingNameSymbol ifTrue:[
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   382
                AccessLock critical:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   383
                    dict2 := EncoderClassesByName at:#unicode.
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   384
                ].
8194
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   385
                clsName2 := dict2 at:encoding1 ifAbsent:nil.
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   386
                clsName2 notNil ifTrue:[
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   387
                    clsName1 isBehavior ifTrue:[
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   388
                        cls1 := clsName1
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   389
                    ] ifFalse:[
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   390
                        cls1 := CharacterEncoderImplementations at:clsName1.
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   391
                    ].
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   392
                    clsName2 isBehavior ifTrue:[
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   393
                        cls2 := clsName2
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   394
                    ] ifFalse:[
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   395
                        cls2 := CharacterEncoderImplementations at:clsName2.
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   396
                    ].
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   397
                    (cls1 notNil and:[cls2 notNil]) ifTrue:[
14207
f80306416305 comment/format in:
Claus Gittinger <cg@exept.de>
parents: 14206
diff changeset
   398
                        enc1 := cls1 new.
f80306416305 comment/format in:
Claus Gittinger <cg@exept.de>
parents: 14206
diff changeset
   399
                        enc2 := cls2 new.
8194
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   400
                        enc := TwoStepEncoder new encoder1:enc1 encoder2:enc2.
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   401
                        ^ enc.
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   402
                    ].
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   403
                ]
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   404
            ]
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   405
        ]
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   406
    ].
7027457dbe4f *** empty log message ***
ca
parents: 8190
diff changeset
   407
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   408
    ^ exceptionValue value
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   409
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   410
    "
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   411
     CharacterEncoder encoderFor:#'latin1'       
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   412
     self encoderFor:#'arabic'              
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   413
     self encoderFor:#'ms-arabic'           
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   414
     self encoderFor:#'iso8859-5'           
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   415
     self encoderFor:#'koi8-r'      
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   416
     self encoderFor:#'koi8-u'      
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   417
     self encoderFor:#'jis0208'      
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   418
     self encoderFor:#'jis7'      
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   419
     self encoderFor:#'unicode'      
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   420
    "
14207
f80306416305 comment/format in:
Claus Gittinger <cg@exept.de>
parents: 14206
diff changeset
   421
f80306416305 comment/format in:
Claus Gittinger <cg@exept.de>
parents: 14206
diff changeset
   422
    "Modified: / 12-07-2012 / 19:45:58 / cg"
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   423
!
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   424
8210
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   425
encoderForUTF8
8211
c4377c6c20e4 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8210
diff changeset
   426
    "return an encoder-instance which can map unicode into/from utf8"
c4377c6c20e4 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8210
diff changeset
   427
8210
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   428
    ^ self encoderFor:#utf8
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   429
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   430
    "
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   431
     CharacterEncoder encoderFor:#'latin1'       
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   432
     self encoderFor:#'arabic'       
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   433
     self encoderFor:#'ms-arabic'       
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   434
     self encoderFor:#'iso8859-5'    
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   435
     self encoderFor:#'koi8-r'      
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   436
     self encoderFor:#'koi8-u'      
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   437
     self encoderFor:#'jis0208'      
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   438
     self encoderFor:#'jis7'      
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   439
     self encoderFor:#'utf8'      
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   440
     self encoderForUTF8'      
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   441
    "
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   442
!
cac1802b8603 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8194
diff changeset
   443
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   444
encoderToEncodeFrom:oldEncodingArg into:newEncodingArg
8135
f22398526ae2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8134
diff changeset
   445
    |oldEncoding newEncoding encoders encoderClasses encoder decoder clsName cls|
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   446
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   447
    oldEncoding := oldEncodingArg ? #unicode.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   448
    oldEncoding == #'iso10646-1' ifTrue:[ oldEncoding :=  #unicode].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   449
    newEncoding := newEncodingArg ? #unicode.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   450
    newEncoding == #'iso10646-1' ifTrue:[ newEncoding :=  #unicode].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   451
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   452
    oldEncoding = newEncoding ifTrue:[^ NullEncoderInstance].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   453
    (oldEncoding match:newEncoding) ifTrue:[^ NullEncoderInstance].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   454
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   455
    (oldEncoding = #unicode) ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   456
        "/ something -> unicode 
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   457
        ^ self encoderFor:newEncoding.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
   458
    ].
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   459
8261
af6485d43bd2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8226
diff changeset
   460
    oldEncoding isSymbol ifFalse:[oldEncoding := oldEncoding asSymbol].
af6485d43bd2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8226
diff changeset
   461
    newEncoding isSymbol ifFalse:[newEncoding := newEncoding asSymbol].
8120
bafc72f60618 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8119
diff changeset
   462
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   463
    AccessLock critical:[
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   464
        encoders := EncodersByName at:oldEncoding ifAbsent:nil.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   465
        encoders isNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   466
            EncodersByName at:oldEncoding put:(encoders := Dictionary new).
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   467
        ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   468
        encoder := encoders at:newEncodingArg ifAbsent:nil.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   469
        encoder isNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   470
            encoderClasses := EncoderClassesByName at:oldEncoding ifAbsent:nil.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   471
            encoderClasses isNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   472
                EncoderClassesByName at:oldEncoding put:(encoderClasses := Dictionary new).
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   473
            ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   474
            clsName := encoderClasses at:newEncoding ifAbsent:nil.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   475
            clsName notNil ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   476
                clsName isBehavior ifTrue:[
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   477
                    cls := clsName
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   478
                ] ifFalse:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   479
                    cls := CharacterEncoderImplementations at:clsName.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   480
                ]
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   481
            ].
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   482
        ].
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   483
    ].
8262
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   484
    cls notNil ifTrue:[
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   485
        encoder := cls new.
550c67712dfa do not autoload while in accesslock (deadlock)
Claus Gittinger <cg@exept.de>
parents: 8261
diff changeset
   486
    ].
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   487
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   488
    encoder isNil ifTrue:[
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   489
        (newEncoding == #unicode) ifTrue:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   490
            "/ something -> unicode 
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   491
            decoder := self encoderFor:oldEncoding.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   492
            encoder := InverseEncoder new decoder:decoder.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   493
        ] ifFalse:[
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   494
            "/ do it as: oldEncoding -> unicode -> newEncoding
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   495
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   496
            "/ something -> unicode 
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   497
            decoder := self encoderFor:oldEncoding.
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   498
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   499
            "/ unicode -> something
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   500
            encoder := self encoderFor:newEncoding.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   501
            encoder := CompoundEncoder new encoder:encoder decoder:decoder.
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   502
        ].
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   503
    ].
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   504
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   505
    AccessLock critical:[
8155
5c67868ddc38 *** empty log message ***
ca
parents: 8154
diff changeset
   506
        (EncodersByName at:oldEncoding) at:newEncoding put:encoder
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   507
    ].
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   508
    ^ encoder
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   509
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   510
    "   CharacterEncoder initialize
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   511
     CharacterEncoder encoderToEncodeFrom:#'latin1' into:#'jis7'      
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   512
     CharacterEncoder encoderToEncodeFrom:#'koi8-r' into:#'mac-cyrillic'              
8087
0a2ee76bcf55 last version before separating into extra classes
Claus Gittinger <cg@exept.de>
parents: 8062
diff changeset
   513
     CharacterEncoder encoderToEncodeFrom:#'ms-arabic' into:#'mac-arabic'           
0a2ee76bcf55 last version before separating into extra classes
Claus Gittinger <cg@exept.de>
parents: 8062
diff changeset
   514
     CharacterEncoder encoderToEncodeFrom:#'iso8859-5' into:#'koi8-r'           
0a2ee76bcf55 last version before separating into extra classes
Claus Gittinger <cg@exept.de>
parents: 8062
diff changeset
   515
     CharacterEncoder encoderToEncodeFrom:#'koi8-r' into:#'koi8-u'       
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   516
    "
14207
f80306416305 comment/format in:
Claus Gittinger <cg@exept.de>
parents: 14206
diff changeset
   517
f80306416305 comment/format in:
Claus Gittinger <cg@exept.de>
parents: 14206
diff changeset
   518
    "Modified: / 12-07-2012 / 19:45:15 / cg"
7971
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   519
! !
357e53496acc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7969
diff changeset
   520
7932
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   521
!CharacterEncoder class methodsFor:'Compatibility-ST80'!
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   522
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   523
encoderNamed: encoderName
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   524
    "/ q & d hack
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   525
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   526
    encoderName == #default ifTrue:[
11262
5de131eaba9e changed #classMenuCompareTwoRepositoryVersions
Claus Gittinger <cg@exept.de>
parents: 11228
diff changeset
   527
        ^ DefaultEncoder new
7932
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   528
    ].
11262
5de131eaba9e changed #classMenuCompareTwoRepositoryVersions
Claus Gittinger <cg@exept.de>
parents: 11228
diff changeset
   529
self halt:'should not be reached'.
7932
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   530
    ^ self new
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   531
!
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   532
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   533
platformName
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   534
    ^ OperatingSystem platformName
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   535
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   536
    "Created: 20.6.1997 / 17:34:03 / cg"
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   537
    "Modified: 20.6.1997 / 17:38:40 / cg"
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   538
! !
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
   539
11316
0b2757774461 access method #nullEncoderInstance
Stefan Vogel <sv@exept.de>
parents: 11300
diff changeset
   540
!CharacterEncoder class methodsFor:'accessing'!
0b2757774461 access method #nullEncoderInstance
Stefan Vogel <sv@exept.de>
parents: 11300
diff changeset
   541
0b2757774461 access method #nullEncoderInstance
Stefan Vogel <sv@exept.de>
parents: 11300
diff changeset
   542
nullEncoderInstance
0b2757774461 access method #nullEncoderInstance
Stefan Vogel <sv@exept.de>
parents: 11300
diff changeset
   543
    ^ NullEncoderInstance
0b2757774461 access method #nullEncoderInstance
Stefan Vogel <sv@exept.de>
parents: 11300
diff changeset
   544
! !
0b2757774461 access method #nullEncoderInstance
Stefan Vogel <sv@exept.de>
parents: 11300
diff changeset
   545
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   546
!CharacterEncoder class methodsFor:'class initialization'!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   547
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   548
initialize
8154
87ec7c3be46a *** empty log message ***
ca
parents: 8153
diff changeset
   549
    |ud|
87ec7c3be46a *** empty log message ***
ca
parents: 8153
diff changeset
   550
14777
a669080229da add user friendly name to semaphores
Stefan Vogel <sv@exept.de>
parents: 14559
diff changeset
   551
    AccessLock := RecursionLock new name:'CharacterEncoder'.
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   552
    NullEncoderInstance := NullEncoder new.
7973
6dea491d56f7 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7972
diff changeset
   553
8126
33f9c4850e84 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8122
diff changeset
   554
    EncodersByName := Dictionary new.
33f9c4850e84 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8122
diff changeset
   555
    EncoderClassesByName := Dictionary new.
33f9c4850e84 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8122
diff changeset
   556
    CachedEncoders := Dictionary new.
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   557
8154
87ec7c3be46a *** empty log message ***
ca
parents: 8153
diff changeset
   558
    EncoderClassesByName at:#'unicode' put:(ud := Dictionary new).
87ec7c3be46a *** empty log message ***
ca
parents: 8153
diff changeset
   559
    ud at:#'fontspecific' put:NullEncoder.    
87ec7c3be46a *** empty log message ***
ca
parents: 8153
diff changeset
   560
    ud at:#'adobe-fontspecific' put:NullEncoder.    
8190
2c1bbf4a6a13 ms-oem untranslated
ca
parents: 8187
diff changeset
   561
    ud at:#'ms-oem' put:NullEncoder.    
13326
41549853fc87 changed: #initialize
Claus Gittinger <cg@exept.de>
parents: 13325
diff changeset
   562
    ud at:#'ms-default' put:NullEncoder.    
8152
e07693c46cf5 *** empty log message ***
ca
parents: 8151
diff changeset
   563
8135
f22398526ae2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8134
diff changeset
   564
    "/ className decoded-name array-of-encodingNames
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   565
    #(
15966
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
   566
        (ASCII              unicode     ( ascii 'us-ascii' 'iso-ir-6' 'ibm-367' 'ms-cp367' 'cp367'  'iso646-us' 'ibm-cp367' 'ansi_x3.4-1968' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   567
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   568
        (BIG5               unicode     ( big5 ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   569
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   570
        (CNS11643           unicode     ( 'cns11643' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   571
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   572
        (CP437              unicode     ( 'cp437'  'cp-437' 'ibm-437' 'ms-cp437' 'microsoft-cp437' 'ibm-cp437' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   573
13063
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   574
        (EBCDIC             unicode     ( 'ebcdic' ))
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   575
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   576
        (GB2313_1980        unicode     ( 'gb2313' 'gb2313-1980' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   577
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   578
        (HANGUL             unicode     ( 'hangul' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   579
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   580
        (ISO10646_1         unicode     ( unicode 'iso10646_1' 'iso10646-1' 'iso-10646-1' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   581
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   582
        (ISO10646_to_UTF8   unicode     ( utf8 'utf-8' ))
8904
d358f0a17f07 utf16 support
Claus Gittinger <cg@exept.de>
parents: 8856
diff changeset
   583
        (ISO10646_to_UTF16BE unicode    ( utf16b utf16be 'utf-16b' 'utf-16be' ))
d358f0a17f07 utf16 support
Claus Gittinger <cg@exept.de>
parents: 8856
diff changeset
   584
        (ISO10646_to_UTF16LE unicode    ( utf16l utf16le 'utf-16e' 'utf-16le' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   585
8855
289b5bda04bb guessEncoding - return the real encodings name
Claus Gittinger <cg@exept.de>
parents: 8814
diff changeset
   586
        (ISO8859_1          unicode     ( 'iso8859_1' 'iso8859-1' 'iso-8859-1' 'latin-1' 'latin1' 'iso-ir-100' 'ibm-819' 'ms-cp819' 'ibm-cp819' 'iso8859'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   587
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   588
        (ISO8859_2          unicode     ( 'iso8859_2' 'iso8859-2' 'iso-8859-2' 'latin2' 'latin-2' 'iso-ir-101'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   589
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   590
        (ISO8859_3          unicode     ( 'iso8859_3' 'iso8859-3' 'iso-8859-3' 'latin3' 'latin-3' 'iso-ir-109'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   591
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   592
        (ISO8859_4          unicode     ( 'iso8859_4' 'iso8859-4' 'iso-8859-4' 'latin4' 'latin-4' 'iso-ir-110'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   593
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   594
        (ISO8859_5          unicode     ( 'iso8859_5' 'iso8859-5' 'iso-8859-5' 'cyrillic' 'iso-ir-144' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   595
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   596
        (ISO8859_6          unicode     ( 'iso8859_6' 'iso8859-6' 'iso-8859-6' 'arabic' 'asmo-708' 'ecma-114' 'iso-ir-127' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   597
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   598
        (ISO8859_7          unicode     ( 'iso8859_7' 'iso8859-7' 'iso-8859-7' 'greek' 'iso-ir-126' 'ecma-118'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   599
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   600
        (ISO8859_8          unicode     ( 'iso8859_8' 'iso8859-8' 'iso-8859-8' 'hebrew' 'iso-ir-138' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   601
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   602
        (ISO8859_9          unicode     ( 'iso8859_9' 'iso8859-9' 'iso-8859-9' 'latin5' 'latin-5' 'iso-ir-148'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   603
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   604
        (ISO8859_10         unicode     ( 'iso8859_10' 'iso8859-10' 'iso-8859-10' 'latin6' 'latin-6' 'iso-ir-157'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   605
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   606
        (ISO8859_11         unicode     ( 'iso8859_11' 'iso8859-11' 'iso-8859-11' 'thai' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   607
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   608
        (ISO8859_13         unicode     ( 'iso8859_13' 'iso8859-13' 'iso-8859-13' 'latin7' 'latin-7' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   609
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   610
        (ISO8859_14         unicode     ( 'iso8859_14' 'iso8859-14' 'iso-8859-14' 'latin8' 'latin-8' 'latin-celtic' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   611
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   612
        (ISO8859_15         unicode     ( 'iso8859_15' 'iso8859-15' 'iso-8859-15' 'latin9' 'latin-9' 'iso-ir-203'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   613
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   614
        (ISO8859_16         unicode     ( 'iso8859_16' 'iso8859-16' 'iso-8859-16' 'latin10' 'latin-10' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   615
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   616
        (JIS0201            unicode     ( 'jis0201' #'jisx0201.1976-0'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   617
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   618
        (JIS0208            unicode     ( jis0208 'jisx0208' 'jisx0208.1983-0' 'jisx0208.1990-0'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   619
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   620
        (JIS0208_to_JIS7    jis0208     ( jis7 'jis-7' 'x-jis7' 'x-iso2022-jp' 'iso2022-jp'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   621
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   622
        (JIS0208_to_EUC     jis0208     ( euc #'x-euc-jp' ))
8122
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   623
8176
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
   624
        (JIS0208_to_SJIS    jis0208     ( 'sjis' 'shiftjis' 'x-sjis' #'x-shift-jis' #'shift-jis'))
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
   625
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   626
        (JIS0212            unicode     ( 'jis0212' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   627
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   628
        (JOHAB              unicode     ( 'johab' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   629
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   630
        (KOI7               unicode     ( 'koi7' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   631
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   632
        (KOI8_R             unicode     ( #'koi8-r' 'cp878' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   633
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   634
        (KOI8_U             unicode     ( #'koi8-u' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   635
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   636
        (KSC5601            unicode     ( #'ksc5601' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   637
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   638
        (MAC_Arabic         unicode     ( #'mac-arabic' 'macarabic' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   639
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   640
        (MAC_CentralEuropean unicode    ( #'mac-centraleuropean' #'mac-centraleurope' 'maccentraleurope' 'maccentraleuropean' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   641
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   642
        (MAC_Croatian       unicode     ( #'mac-croatian' 'maccroatian'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   643
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   644
        (MAC_Cyrillic       unicode     ( #'mac-cyrillic' 'maccyrillic' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   645
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   646
        (MAC_Dingbats       unicode     ( #'mac-dingbats'  'macdingbats'  'macdingbat'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   647
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   648
        (MAC_Farsi          unicode     ( #'mac-farsi' 'macfarsi' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   649
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   650
        (MAC_Greek          unicode     ( #'mac-greek' #'macgreek' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   651
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   652
        (MAC_Hebrew         unicode     ( #'mac-hebrew' #'machebrew'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   653
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   654
        (MAC_Iceland        unicode     ( #'mac-iceland' #'maciceland'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   655
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   656
        (MAC_Japanese       unicode     ( #'mac-japanese' #'macjapanese'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   657
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   658
        (MAC_Korean         unicode     ( #'mac-korean' #'mackorean'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   659
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   660
        (MAC_Roman          unicode     ( #'mac-roman' #'macroman'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   661
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   662
        (MAC_Romanian       unicode     ( #'mac-romanian' #'macromanian'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   663
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   664
        (MAC_Symbol         unicode     ( #'mac-symbol' #'macsymbol'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   665
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   666
        (MAC_Thai           unicode     ( #'mac-thai' #'macthai'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   667
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   668
        (MAC_Turkish        unicode     ( #'mac-turkish' #'macturkish'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   669
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   670
        (MS_Ansi            unicode     ( #'ms-ansi' 'ms-cp1252' 'microsoft-cp1252' 'cp1252' 'microsoft-ansi' 'windows-1252' 'windows-latin1'))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   671
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   672
        (MS_Arabic          unicode     ( 'ms-arabic' 'ms-cp1256' 'microsoft-cp1256' 'cp1256'  'microsoft-arabic' 'windows-1256'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   673
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   674
        (MS_Baltic          unicode     ( 'ms-baltic' 'ms-cp1257' 'microsoft-cp1257' 'cp1257' 'microsoft-baltic' 'windows-1257'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   675
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   676
        (MS_Cyrillic        unicode     ( 'ms-cyrillic' 'ms-cp1251' 'microsoft-cp1251' 'cp1251' 'microsoft-cyrillic' 'windows-1251'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   677
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   678
        (MS_EastEuropean    unicode     ( 'ms-easteuropean' 'ms-ee' 'cp1250' 'ms-cp1250' 'microsoft-cp1250' 'microsoft-easteuropean' 'windows-1250'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   679
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   680
        (MS_Greek           unicode     ( 'ms-greek' 'ms-cp1253' 'microsoft-cp1253' 'cp1253' 'microsoft-greek' 'windows-1253' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   681
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   682
        (MS_Hebrew          unicode     ( 'ms-hebrew' 'ms-cp1255' 'microsoft-cp1255' 'cp1255' 'microsoft-hebrew' 'windows-1255' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   683
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   684
"/        (MS_Symbol           unicode     ( 'ms-symbol' 'microsoft-symbol'  ))
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   685
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   686
        (MS_Turkish         unicode     ( 'ms-turkish' 'ms-cp1254' 'microsoft-cp1254' 'cp1254' 'microsoft-turkish' 'windows-1254'  ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   687
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   688
        (NEXT               unicode     ( 'next' 'nextstep'  ))
8186
ae97115c26f5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8176
diff changeset
   689
10111
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
   690
        (ISO10646_to_SGML       unicode     ( 'sgml' ))
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
   691
        (ISO10646_to_JavaText   unicode     ( 'java' 'javaText' ))
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   692
    ) triplesDo:[:className :decodesTo :encodesTo |
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   693
        |dict|
8134
0296806cb4bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8132
diff changeset
   694
8151
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   695
        "/ notice that the encoders are not yet installed as autoloaded.
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   696
        "/ Therefore, we remember their names here.
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   697
        dict := EncoderClassesByName at:decodesTo ifAbsent:nil.
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   698
        dict isNil ifTrue:[
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   699
            EncoderClassesByName at:decodesTo put:(dict := Dictionary new).
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   700
        ].
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   701
        encodesTo do:[:eachEncodingAlias |
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   702
            (dict includesKey:eachEncodingAlias) ifTrue:[
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   703
                self halt:'conflicting alias'
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   704
            ].
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   705
            dict at:eachEncodingAlias put:className.    
1f0fc1d4516b *** empty log message ***
ca
parents: 8150
diff changeset
   706
        ].
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   707
    ].
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   708
15966
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
   709
    OperatingSystem isUNIXlike ifTrue:[
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
   710
        "/Initialize OS system encoder
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
   711
        OperatingSystem getCodesetEncoder.
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
   712
    ].
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
   713
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   714
    "
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   715
     self initialize
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   716
    "
10111
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
   717
13326
41549853fc87 changed: #initialize
Claus Gittinger <cg@exept.de>
parents: 13325
diff changeset
   718
    "Modified: / 01-04-2011 / 14:30:06 / cg"
15966
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
   719
    "Modified (format): / 23-01-2013 / 09:56:53 / Jan Vrany <jan.vrany@fit.cvut.cz>"
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   720
! !
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   721
8122
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   722
!CharacterEncoder class methodsFor:'constants'!
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   723
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   724
jis7KanjiEscapeSequence
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   725
    "return the escape sequence used to switch to kanji in jis7 encoded strings.
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   726
     This happens to be the same as ISO2022-JP's escape sequence."
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   727
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   728
    Jis7KanjiEscapeSequence isNil ifTrue:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   729
        Jis7KanjiEscapeSequence := Character esc asString , '$B'.
8122
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   730
    ].
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   731
    ^ Jis7KanjiEscapeSequence.
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   732
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   733
    "Created: 26.2.1996 / 17:38:08 / cg"
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   734
    "Modified: 30.6.1997 / 16:03:16 / cg"
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   735
!
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   736
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   737
jis7KanjiOldEscapeSequence
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   738
    "return the escape sequence used to switch to kanji in some old jis7 encoded strings."
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   739
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   740
    Jis7KanjiOldEscapeSequence isNil ifTrue:[
8856
cb0a15744854 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8855
diff changeset
   741
        Jis7KanjiOldEscapeSequence := Character esc asString , '$@'.
8122
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   742
    ].
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   743
    ^ Jis7KanjiOldEscapeSequence.
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   744
!
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   745
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   746
jis7RomanEscapeSequence
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   747
    "return the escape sequence used to switch to roman in jis7 encoded strings"
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   748
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   749
    Jis7RomanEscapeSequence isNil ifTrue:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   750
        Jis7RomanEscapeSequence := Character esc asString , '(J'.
8122
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   751
    ].
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   752
    ^ Jis7RomanEscapeSequence.
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   753
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   754
    "Created: 26.2.1996 / 17:38:08 / cg"
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   755
    "Modified: 30.6.1997 / 16:03:16 / cg"
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   756
!
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   757
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   758
jisISO2022EscapeSequence
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   759
    "return the escape sequence used to switch to kanji in iso2022 encoded strings"
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   760
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   761
    JisISO2022EscapeSequence isNil ifTrue:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   762
        JisISO2022EscapeSequence := Character esc asString , '&@' , Character esc asString , '$B'.
8122
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   763
    ].
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   764
    ^ JisISO2022EscapeSequence.
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   765
! !
29670db31014 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8120
diff changeset
   766
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   767
!CharacterEncoder class methodsFor:'encoding & decoding'!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   768
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   769
decode:aCodePoint
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   770
    ^ self new decode:aCodePoint
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   771
!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   772
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   773
decodeString:aString
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   774
    ^ self new decodeString:aString
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   775
!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   776
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   777
decodeString:aString from:oldEncoding
8016
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   778
    ^ self encodeString:aString from:oldEncoding into:#'unicode'
7967
f9baf81d6991 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7964
diff changeset
   779
!
f9baf81d6991 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7964
diff changeset
   780
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   781
encode:aCodePoint
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   782
    ^ self new encode:aCodePoint
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   783
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   784
    "
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   785
     ISO8859_1 encode:16r00FF   
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   786
     ISO8859_1 decodeString:'hello'
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   787
     ISO8859_1 encodeString:(ISO8859_1 decodeString:'hello') 
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   788
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   789
     ISO8859_5 decodeString:(String 
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   790
                                with:(Character value:16rE4)
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   791
                                with:(Character value:16rE0)) 
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   792
    "
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   793
!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   794
7994
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   795
encode:codePoint from:oldEncodingArg into:newEncodingArg
8015
e85b0c11e871 caching encoders
Claus Gittinger <cg@exept.de>
parents: 7994
diff changeset
   796
    |oldEncoding newEncoding encoder|
7994
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   797
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   798
    oldEncoding := oldEncodingArg ? #'unicode'.
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   799
    oldEncoding == #'iso10646-1' ifTrue:[ oldEncoding :=  #'unicode'].
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   800
    newEncoding := newEncodingArg ? #'unicode'.
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   801
    newEncoding == #'iso10646-1' ifTrue:[ newEncoding :=  #'unicode'].
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   802
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   803
    oldEncoding == newEncoding ifTrue:[^ codePoint].
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   804
8016
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   805
    oldEncoding == #'unicode' ifTrue:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   806
        newEncoding == #'iso8859-1' ifTrue:[
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   807
            codePoint <= 16rFF ifTrue:[
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   808
                ^ codePoint
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   809
            ]
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   810
        ]
8016
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   811
    ].
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   812
    newEncoding == #'unicode' ifTrue:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   813
        oldEncoding == #'iso8859-1' ifTrue:[
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   814
            codePoint <= 16rFF ifTrue:[
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   815
                ^ codePoint
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   816
            ]
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   817
        ]
8016
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   818
    ].
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   819
    encoder := self encoderToEncodeFrom:oldEncoding into:newEncoding.
8015
e85b0c11e871 caching encoders
Claus Gittinger <cg@exept.de>
parents: 7994
diff changeset
   820
    ^ encoder encode:codePoint.
7994
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   821
!
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   822
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   823
encodeString:aUnicodeString
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   824
    "given a string in unicode, return a string in my encoding for it"
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   825
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   826
    ^ self new encodeString:aUnicodeString
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   827
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   828
    "
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   829
     ISO8859_1 decodeString:'hello'
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   830
    "
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
   831
!
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
   832
7967
f9baf81d6991 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7964
diff changeset
   833
encodeString:aString from:oldEncodingArg into:newEncodingArg
8015
e85b0c11e871 caching encoders
Claus Gittinger <cg@exept.de>
parents: 7994
diff changeset
   834
    |oldEncoding newEncoding encoder|
7967
f9baf81d6991 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7964
diff changeset
   835
14916
d81790d8f204 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 14777
diff changeset
   836
    "/ some hard coded aliases
7967
f9baf81d6991 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7964
diff changeset
   837
    oldEncoding := oldEncodingArg ? #'unicode'.
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   838
    oldEncoding == #'iso10646-1' ifTrue:[ oldEncoding :=  #'unicode'].
14916
d81790d8f204 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 14777
diff changeset
   839
    oldEncoding == #'ms-default' ifTrue:[ oldEncoding :=  #'unicode'].
d81790d8f204 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 14777
diff changeset
   840
7967
f9baf81d6991 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7964
diff changeset
   841
    newEncoding := newEncodingArg ? #'unicode'.
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   842
    newEncoding == #'iso10646-1' ifTrue:[ newEncoding :=  #'unicode'].
14916
d81790d8f204 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 14777
diff changeset
   843
    newEncoding == #'ms-default' ifTrue:[ newEncoding :=  #'unicode'].
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   844
7967
f9baf81d6991 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7964
diff changeset
   845
    oldEncoding == newEncoding ifTrue:[^ aString].
f9baf81d6991 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7964
diff changeset
   846
14916
d81790d8f204 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 14777
diff changeset
   847
    "/ for single-byte strings, iso8859-1 and unicode (up to FF) have the same encoding
8016
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   848
    oldEncoding == #'unicode' ifTrue:[
14916
d81790d8f204 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 14777
diff changeset
   849
        (newEncoding == #'iso8859-1') ifTrue:[
14559
9a046528632d class: CharacterEncoder
Stefan Vogel <sv@exept.de>
parents: 14523
diff changeset
   850
            aString isWideString ifFalse:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   851
                ^ aString
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   852
            ]
14916
d81790d8f204 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 14777
diff changeset
   853
        ].
8016
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   854
    ].
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   855
    newEncoding == #'unicode' ifTrue:[
14916
d81790d8f204 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 14777
diff changeset
   856
        (oldEncoding == #'iso8859-1') ifTrue:[
14559
9a046528632d class: CharacterEncoder
Stefan Vogel <sv@exept.de>
parents: 14523
diff changeset
   857
            aString isWideString ifFalse:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   858
                ^ aString
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   859
            ]
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   860
        ]
8016
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   861
    ].
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   862
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
   863
    encoder := self encoderToEncodeFrom:oldEncoding into:newEncoding.
8015
e85b0c11e871 caching encoders
Claus Gittinger <cg@exept.de>
parents: 7994
diff changeset
   864
    ^ encoder encodeString:aString.
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   865
!
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   866
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
   867
encodeString:aString into:newEncoding
8016
6344e4e47261 characterEncoding stuff
Claus Gittinger <cg@exept.de>
parents: 8015
diff changeset
   868
    ^ self encodeString:aString from:#'unicode' into:newEncoding
13063
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   869
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   870
    "
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   871
     self encodeString:'hello' into:#ebcdic
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   872
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   873
     self encodeString:(self encodeString:'hello' into:#ebcdic) from:#ebcdic into:#ascii    
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   874
     self encodeString:(self encodeString:'hello' into:#ebcdic) from:#ebcdic into:#unicode    
a17ba204b911 comment/format in: #encodeString:into:
Claus Gittinger <cg@exept.de>
parents: 12608
diff changeset
   875
    "
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   876
! !
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   877
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   878
!CharacterEncoder class methodsFor:'private'!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   879
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   880
flushCode
8127
7531ed2cdf35 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8126
diff changeset
   881
    self initialize.
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
   882
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   883
    self isAbstract ifFalse:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   884
        (self mapFileURL1_relativePathName notNil
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   885
        or:[ self mapFileURL2_relativePathName notNil]) ifTrue:[
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   886
            self class removeSelector:#mapping.
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   887
        ].
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   888
    ].
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   889
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   890
    "
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   891
     self flushCode
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   892
    "
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   893
! !
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   894
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   895
!CharacterEncoder class methodsFor:'private-mapping setup'!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   896
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   897
generateCode
7909
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   898
    (CharacterEncoderCodeGenerator new targetClass:self) generateCode.
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   899
!
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   900
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   901
generateSubclassCode
a045c719fca2 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7904
diff changeset
   902
    (CharacterEncoderCodeGenerator new targetClass:self) generateSubclassCode.
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   903
!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   904
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
   905
mapFileURL1_codeColumn
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
   906
    ^ 1
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
   907
!
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
   908
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   909
mapFileURL1_relativePathName
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   910
    "raise an error: must be redefined in concrete subclass(es)"
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   911
    
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   912
    ^ nil
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   913
!
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   914
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   915
mapFileURL2_relativePathName
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   916
    "raise an error: must be redefined in concrete subclass(es)"
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   917
    
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   918
    ^ nil
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   919
!
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   920
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   921
mappingURL1
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   922
    "raise an error: must be redefined in concrete subclass(es)"
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   923
    
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   924
    |rel|
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   925
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   926
    rel := self mapFileURL1_relativePathName.
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   927
    rel isNil ifTrue:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   928
        ^ nil
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   929
    ].
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   930
    ^ 'http://www.unicode.org/Public/MAPPINGS/' , rel
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   931
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   932
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   933
mappingURL2
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   934
    "raise an error: must be redefined in concrete subclass(es)"
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   935
    
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   936
    |rel|
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   937
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   938
    rel := self mapFileURL2_relativePathName.
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   939
    rel isNil ifTrue:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
   940
        ^ nil
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
   941
    ].
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   942
    ^ 'http://std.dkuug.dk/i18n/charmaps/' , rel
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   943
! !
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   944
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   945
!CharacterEncoder class methodsFor:'queries'!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
   946
7938
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   947
isEncoding:subSetEncodingArg subSetOf:superSetEncodingArg
7994
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   948
    "return true, if superSetEncoding encoding includes all characters of subSetEncoding.
42b5face56fb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7986
diff changeset
   949
     (this means: characters are included - not that they have the same encoding)"
7938
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   950
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   951
    |subSetEncoding superSetEncoding|
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   952
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   953
    subSetEncodingArg = superSetEncodingArg ifTrue:[^ true].
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   954
    subSetEncoding := subSetEncodingArg asLowercase.
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   955
    superSetEncoding := superSetEncodingArg asLowercase.
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   956
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   957
    (subSetEncoding match:superSetEncoding) ifTrue:[^ true].
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   958
8214
406c7fc10e12 assume ms-ansi is same as unicode
Claus Gittinger <cg@exept.de>
parents: 8211
diff changeset
   959
    (('iso10646*' match:superSetEncoding) 
406c7fc10e12 assume ms-ansi is same as unicode
Claus Gittinger <cg@exept.de>
parents: 8211
diff changeset
   960
    or:[superSetEncoding = 'unicode'
406c7fc10e12 assume ms-ansi is same as unicode
Claus Gittinger <cg@exept.de>
parents: 8211
diff changeset
   961
    or:[superSetEncoding = 'ms-ansi']]) ifTrue:[
406c7fc10e12 assume ms-ansi is same as unicode
Claus Gittinger <cg@exept.de>
parents: 8211
diff changeset
   962
        "/ assume that any character is in unicode
406c7fc10e12 assume ms-ansi is same as unicode
Claus Gittinger <cg@exept.de>
parents: 8211
diff changeset
   963
        ^ true.
7938
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   964
    ].
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   965
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   966
    "/ if the subSet is iso8859-*, that means ascii (i.e. the lower 7 bits of iso8859 only).
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   967
    ((subSetEncoding = 'iso8859*') or:[subSetEncoding = 'iso8859-*']) ifTrue:[
8168
8f8da8bb046d *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8156
diff changeset
   968
        ('ascii*' match:superSetEncoding) ifTrue:[^ true].
8f8da8bb046d *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8156
diff changeset
   969
        ('ms-ansi*' match:superSetEncoding) ifTrue:[^ true].
7938
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   970
    ].
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   971
    (subSetEncoding = 'ascii') ifTrue:[
8168
8f8da8bb046d *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8156
diff changeset
   972
        ('iso8859*' match:superSetEncoding) ifTrue:[^ true].
8f8da8bb046d *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8156
diff changeset
   973
        ('ms-ansi*' match:superSetEncoding) ifTrue:[^ true].
7938
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   974
    ].
a53aae4a05bb *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7932
diff changeset
   975
7923
e8286ccdf20c *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7922
diff changeset
   976
    "/ TODO: check the charSets mappingTables...
e8286ccdf20c *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7922
diff changeset
   977
    "/ self halt.
e8286ccdf20c *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7922
diff changeset
   978
    ^ false.
e8286ccdf20c *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7922
diff changeset
   979
!
e8286ccdf20c *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7922
diff changeset
   980
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   981
nameOfDecodedCode
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   982
    "Most coders decode from their code into unicode / encode from unicode into their code.
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   983
     There are a few exceptions to this, though - these must redefine this."
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   984
    
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   985
    ^ #'unicode'
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   986
!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   987
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   988
nameOfEncoding
7974
9905043988ee *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7973
diff changeset
   989
    ^ (self nameWithoutPrefix asLowercase copyReplaceAll:$_ with:$-) asSymbol
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   990
!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
   991
7959
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
   992
supportedExternalEncodings
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
   993
    "return an array of arrays containing the names of supported
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
   994
     encodings which are supported for external resources (i.e. files).
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
   995
     The first element contains the internally used symbolic name,
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
   996
     the second contains a user-readable string (description).
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
   997
     More than one external name may be mapped onto the same symbolic."
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
   998
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
   999
    ^ #( 
8176
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1000
         ('utf8'        'Unicode as 8Bit characters'    )  
8904
d358f0a17f07 utf16 support
Claus Gittinger <cg@exept.de>
parents: 8856
diff changeset
  1001
         ('utf16BE'     'Unicode as 16Bit big-endian'    )  
d358f0a17f07 utf16 support
Claus Gittinger <cg@exept.de>
parents: 8856
diff changeset
  1002
         ('utf16LE'     'Unicode as 16Bit little-endian' )  
8176
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1003
"/         ('utf7'        'Unicode as 7Bit characters'    ) 
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1004
"/       nil
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1005
         ('ascii'       'Common 7bit subset of iso8859' )
14188
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1006
         ('iso8859-1'   'Western'                       )
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1007
         ('iso8859-2'   'Central European'              )
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1008
         ('iso8859-3'   'South European'                )
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1009
         ('iso8859-4'   'Baltic'                        )
8176
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1010
         ('iso8859-5'   'Cyrillic'                      )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1011
         ('iso8859-6'   'Arabic'                        )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1012
         ('iso8859-7'   'Greek'                         )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1013
         ('iso8859-8'   'Hebrew'                        )
14188
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1014
         ('iso8859-15'  'Western with Euro'             )
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1015
         ('iso8859-16'  'South European with Euro'      )
8176
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1016
"/       nil
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1017
         ('koi7'        'Cyrillic (Old)'                )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1018
         ('koi8-r'      'Cyrillic'                      )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1019
         ('koi8-u'      'Cyrillic (Ukraine)'            )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1020
"/       nil
14188
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1021
         ('cp437'       'Windows US / codepage 437'       )
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1022
         ('cp850'       'Windows Latin1 / codepage 850'   )
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1023
         ('cp1250'      'Windows Latin2 / codepage 1250'  )
9ff8607b11a4 #userFriendlyName
Stefan Vogel <sv@exept.de>
parents: 14174
diff changeset
  1024
         ('cp1251'      'Windows Cyrillic / codepage 1251')
8176
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1025
"/         ('mac'         'macintosh 8 bit'               )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1026
         ('next'        'NeXT 8 bit'                    )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1027
"/         ('hp'          'hpux 8 bit'                    )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1028
"/       nil
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1029
         ('euc'         'EUC - extended unix code (japanese)'     )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1030
         ('jis7'        'JIS7 - jis 7bit escape codes (japanese)' )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1031
         ('iso-2022-jp' 'Same as jis 7bit'                        )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1032
         ('sjis'        'SJIS - shift jis 8bit codes (japanese)'  )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1033
"/       nil
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1034
         ('gb'          'GB - mainland china'                   )
66d1004f1806 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8168
diff changeset
  1035
         ('big5'        'BIG5 - taiwan'                         )
7959
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
  1036
"/         ('ksc'         'korean'                        )
8186
ae97115c26f5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8176
diff changeset
  1037
         ('sgml'        'SGML (XML/HTML) character escapes'     )
10111
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
  1038
         ('java'        'JavaText (\uXXXX) character escapes'   )
7959
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
  1039
       )
10111
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
  1040
7485e9da838c +javaText encoder
Claus Gittinger <cg@exept.de>
parents: 9143
diff changeset
  1041
    "Modified: / 23-10-2006 / 13:27:48 / cg"
7959
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
  1042
!
0276f0a46dd1 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7956
diff changeset
  1043
7947
16b2306f9bc9 utf8 - full 30 bit range
Claus Gittinger <cg@exept.de>
parents: 7942
diff changeset
  1044
userFriendlyNameOfEncoding
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1045
    ^ self nameOfEncoding asUppercaseFirst
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1046
! !
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1047
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
  1048
!CharacterEncoder class methodsFor:'testing'!
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
  1049
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
  1050
isAbstract
11228
10b5c088d8f2 comment
Claus Gittinger <cg@exept.de>
parents: 10672
diff changeset
  1051
    "Return if this class is an abstract class.
10b5c088d8f2 comment
Claus Gittinger <cg@exept.de>
parents: 10672
diff changeset
  1052
     True is returned for CharacterEncoder here; false for subclasses.
10b5c088d8f2 comment
Claus Gittinger <cg@exept.de>
parents: 10672
diff changeset
  1053
     Abstract subclasses must redefine again."
10b5c088d8f2 comment
Claus Gittinger <cg@exept.de>
parents: 10672
diff changeset
  1054
7912
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
  1055
    ^ self == CharacterEncoder
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
  1056
! !
fbbb59645576 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7909
diff changeset
  1057
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1058
!CharacterEncoder class methodsFor:'utilities'!
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1059
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1060
guessEncodingOfBuffer:buffer
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1061
    "look for a string of the form
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1062
            encoding #name
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1063
     or:
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1064
            encoding: name
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1065
     within the given buffer 
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1066
     (which is usually the first few bytes of a textFile)."
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1067
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1068
    |lcBuffer quote peek|
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1069
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1070
    buffer size < 4 ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1071
        "not enough bytes to determine the contents"
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1072
        ^ nil.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1073
    ].
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1074
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1075
    "check the Byte Order Mark (BOM)"
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1076
    peek := (buffer at:1) codePoint.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1077
    peek < 16rFE ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1078
        (peek = 16rEF
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1079
            and:[(buffer at:2) codePoint = 16rBB 
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1080
            and:[(buffer at:3) codePoint = 16rBF]]) ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1081
            ^ #utf8
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1082
        ].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1083
        (peek = 0 
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1084
            and:[(buffer at:2) codePoint = 0 
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1085
            and:[(buffer at:3) codePoint = 16rFE 
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1086
            and:[(buffer at:4) codePoint = 16rFF]]]) ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1087
            ^ #utf32be
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1088
        ].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1089
    ] ifFalse:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1090
        peek = 16rFF ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1091
            (buffer at:2) codePoint = 16rFE ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1092
                "little endian"
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1093
                ((buffer at:3) codePoint = 0 and:[(buffer at:4) codePoint = 0]) ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1094
                    ^ #utf32le.   
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1095
                ].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1096
                ^ #utf16le
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1097
            ].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1098
        ] ifFalse:["peek = 16rFE"
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1099
            (buffer at:2) codePoint = 16rFF ifTrue:[
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1100
                "big endian"
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1101
                ^ #utf16be
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1102
            ].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1103
        ]
10672
b6230a13035b #guessEncodingOfBuffer - do NOT handle encoding=utf8
Stefan Vogel <sv@exept.de>
parents: 10111
diff changeset
  1104
    ].
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1105
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1106
    lcBuffer := buffer asLowercase.
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1107
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1108
    "now look for an inline encoding markup"
10672
b6230a13035b #guessEncodingOfBuffer - do NOT handle encoding=utf8
Stefan Vogel <sv@exept.de>
parents: 10111
diff changeset
  1109
    #(charset encoding) do:[:keyWord |
8855
289b5bda04bb guessEncoding - return the real encodings name
Claus Gittinger <cg@exept.de>
parents: 8814
diff changeset
  1110
        |encoderOrNil idx s w enc|
289b5bda04bb guessEncoding - return the real encodings name
Claus Gittinger <cg@exept.de>
parents: 8814
diff changeset
  1111
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1112
        (idx := lcBuffer findString:keyWord) ~~ 0 ifTrue:[
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1113
            s := ReadStream on:buffer.
15609
36dd250b19f4 class: CharacterEncoder
Stefan Vogel <sv@exept.de>
parents: 14916
diff changeset
  1114
            s position:idx-1.
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1115
            s skip:keyWord size.
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1116
            s skipSeparators. 
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1117
10672
b6230a13035b #guessEncodingOfBuffer - do NOT handle encoding=utf8
Stefan Vogel <sv@exept.de>
parents: 10111
diff changeset
  1118
            "do not include '=' here, otherwise
b6230a13035b #guessEncodingOfBuffer - do NOT handle encoding=utf8
Stefan Vogel <sv@exept.de>
parents: 10111
diff changeset
  1119
             files containing xml code (<?xml charset='utf8'> will be parsed as UTF-8"
b6230a13035b #guessEncodingOfBuffer - do NOT handle encoding=utf8
Stefan Vogel <sv@exept.de>
parents: 10111
diff changeset
  1120
11300
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1121
            [':#=' includes:s peek] whileTrue:[
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1122
                s next.
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1123
                s skipSeparators. 
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1124
            ].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1125
            s skipSeparators.
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1126
            ('"''' includes:s peek) ifTrue:[
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1127
                quote := s next.
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1128
                w := s upTo:quote.
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1129
            ] ifFalse:[
11300
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1130
                w := s upToMatching:[:ch | ch isSeparator or:[ch == $" or:[ch == $' or:[ch == $> ]]]].
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1131
            ].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1132
            w notNil ifTrue:[
11300
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1133
                enc := w withoutQuotes.
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1134
                (enc startsWith:'x-') ifTrue:[
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1135
                    enc := enc copyFrom:3.
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1136
                ].
10672
b6230a13035b #guessEncodingOfBuffer - do NOT handle encoding=utf8
Stefan Vogel <sv@exept.de>
parents: 10111
diff changeset
  1137
                encoderOrNil := self encoderFor:enc ifAbsent:nil.
8855
289b5bda04bb guessEncoding - return the real encodings name
Claus Gittinger <cg@exept.de>
parents: 8814
diff changeset
  1138
                encoderOrNil notNil ifTrue:[
289b5bda04bb guessEncoding - return the real encodings name
Claus Gittinger <cg@exept.de>
parents: 8814
diff changeset
  1139
                    ^ encoderOrNil nameOfEncoding
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1140
                ].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1141
"/                enc size >=3 ifTrue:[
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1142
"/                    Transcript showCR:'Unknown encoding: ' , (withoutQuotes value:w).
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1143
"/                ]
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1144
            ].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1145
        ].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1146
    ].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1147
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1148
    "/ look for JIS7 / EUC encoding
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1149
    (buffer findString:self jisISO2022EscapeSequence) ~~ 0 ifTrue:[
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1150
        ^ #'iso2020-jp'
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1151
    ].
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1152
    (buffer findString:self jis7KanjiEscapeSequence) ~~ 0 ifTrue:[
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1153
        ^ #jis7
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1154
    ].
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1155
    (buffer findString:self jis7KanjiOldEscapeSequence) ~~ 0 ifTrue:[
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1156
        ^ #jis7
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1157
    ].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1158
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1159
    "/ TODO:
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1160
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1161
"/    "/ look for EUC
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1162
"/    idx := aString findFirst:[:char | |ascii|
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1163
"/                                        ((ascii := char asciiValue) >= 16rA1)     
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1164
"/                                        and:[ascii <= 16rFE]].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1165
"/    idx ~~ 0 ifTrue:[
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1166
"/        ascii := (aString at:(idx + 1)) asciiValue.
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1167
"/        (ascii >= 16rA1 and:[ascii <= 16rFE]) ifTrue:[
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1168
"/            ^ #euc
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1169
"/        ]
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1170
"/    ].
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1171
    "/ look for SJIS ...
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1172
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1173
    ^ nil
14169
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1174
!
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1175
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1176
guessEncodingOfFile:aFilename
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1177
    "look for a string
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1178
        encoding #name
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1179
     or:
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1180
        encoding: name
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1181
     within the given buffer 
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1182
     (which is usually the first few bytes of a textFile).
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1183
     If thats not found, use heuristics (in CharacterArray) to guess."
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1184
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1185
    |s buffer n "{Class: SmallInteger }"|
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1186
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1187
    s := aFilename asFilename readStreamOrNil.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1188
    s isNil ifTrue:[^ nil].
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1189
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1190
    buffer := String new:64.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1191
    n := s nextBytes:buffer size into:buffer.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1192
    s close.
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1193
eab487f07a2b comment/format in: #encoderFor:
Stefan Vogel <sv@exept.de>
parents: 14094
diff changeset
  1194
    ^ self guessEncodingOfBuffer:buffer.
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1195
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1196
    "
14094
b4db3e0f0102 changed:
Stefan Vogel <sv@exept.de>
parents: 13382
diff changeset
  1197
     self guessEncodingOfFile:'../../libview/resources/de.rs' asFilename
b4db3e0f0102 changed:
Stefan Vogel <sv@exept.de>
parents: 13382
diff changeset
  1198
     self guessEncodingOfFile:'../../libview/resources/ru.rs' asFilename
b4db3e0f0102 changed:
Stefan Vogel <sv@exept.de>
parents: 13382
diff changeset
  1199
     self guessEncodingOfFile:'../../libview/resources/th.rs' asFilename
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1200
    "
13382
8e998649b3ac changed:
Claus Gittinger <cg@exept.de>
parents: 13326
diff changeset
  1201
8e998649b3ac changed:
Claus Gittinger <cg@exept.de>
parents: 13326
diff changeset
  1202
    "Modified: / 31-05-2011 / 15:45:19 / cg"
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1203
!
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1204
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1205
guessEncodingOfStream:aStream
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1206
    "look for a string of the form
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1207
            encoding #name
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1208
     or:
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1209
            encoding: name
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1210
     in the first few bytes of aStream."
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1211
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1212
    |oldPosition buffer n|
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1213
13382
8e998649b3ac changed:
Claus Gittinger <cg@exept.de>
parents: 13326
diff changeset
  1214
    buffer := String new:64.
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1215
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1216
    oldPosition := aStream position.
14094
b4db3e0f0102 changed:
Stefan Vogel <sv@exept.de>
parents: 13382
diff changeset
  1217
    n := aStream nextBytes:buffer size into:buffer.
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1218
    aStream position:oldPosition.
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1219
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1220
    ^ self guessEncodingOfBuffer:buffer
13382
8e998649b3ac changed:
Claus Gittinger <cg@exept.de>
parents: 13326
diff changeset
  1221
8e998649b3ac changed:
Claus Gittinger <cg@exept.de>
parents: 13326
diff changeset
  1222
    "Modified: / 31-05-2011 / 15:45:23 / cg"
8810
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1223
!
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1224
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1225
showCharacterSet
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1226
    |font|
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1227
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1228
    font := View defaultFont.
14206
70aa64d89dca comment/format in: #showCharacterSet
Stefan Vogel <sv@exept.de>
parents: 14188
diff changeset
  1229
"/    font := (Font family:'courier' face:'medium' style:'roman' size:12 encoding:'iso10646-1').
8810
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1230
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1231
    CharacterSetView
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1232
        openOn:font
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1233
        label:'Characters of ',self nameWithoutPrefix
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1234
        clickLabel:nil
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1235
        asInputFor:nil
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1236
        encoder:self
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1237
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1238
    "
14206
70aa64d89dca comment/format in: #showCharacterSet
Stefan Vogel <sv@exept.de>
parents: 14188
diff changeset
  1239
     CharacterEncoderImplementations::MS_Ansi showCharacterSet
8810
8f509238ef9f +showCharacterSet
Claus Gittinger <cg@exept.de>
parents: 8722
diff changeset
  1240
    "
8711
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1241
! !
c5f28b4c719d guessEncoding now implemented in CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 8388
diff changeset
  1242
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1243
!CharacterEncoder methodsFor:'encoding & decoding'!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1244
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1245
decode:anEncoding
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1246
    "given an integer in my encoding, return a unicode codePoint for it"
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1247
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1248
    self subclassResponsibility
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1249
!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1250
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1251
decodeString:anEncodedString
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1252
    "given a string in my encoding, return a unicode-string for it"
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1253
8150
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1254
    |newString myCode uniCodePoint bits|
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1255
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1256
    newString := String new:(anEncodedString size).
8150
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1257
    bits := newString bitsPerCharacter.
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1258
8150
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1259
    1 to:anEncodedString size do:[:idx |
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1260
        uniCodePoint := (anEncodedString at:idx) codePoint.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1261
        myCode := self decode:uniCodePoint.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1262
        myCode > 16rFF ifTrue:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1263
            myCode > 16rFFFF ifTrue:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1264
                bits < 32 ifTrue:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1265
                    newString := Unicode32String fromString:newString.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1266
                    bits := 32.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1267
                ]
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1268
            ] ifFalse:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1269
                bits < 16 ifTrue:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1270
                    newString := Unicode16String fromString:newString.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1271
                    bits := 16.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1272
                ]
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1273
            ]
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1274
        ].
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1275
        newString at:idx put:(Character value:myCode).
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1276
    ].
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1277
    ^ newString
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1278
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1279
    "
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1280
     ISO8859_1 decodeString:'hello'
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1281
    "
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1282
!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1283
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1284
encode:aCodePoint
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1285
    "given a codePoint in unicode, return a byte in my encoding for it"
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1286
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1287
    self subclassResponsibility
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1288
!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1289
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1290
encodeString:aUnicodeString
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1291
    "given a string in unicode, return a string in my encoding for it"
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1292
8150
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1293
    |newString myCode uniCodePoint bits|
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1294
8150
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1295
    newString := String new:(aUnicodeString size).
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1296
    bits := newString bitsPerCharacter.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1297
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1298
    1 to:aUnicodeString size do:[:idx |
8150
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1299
        uniCodePoint := (aUnicodeString at:idx) codePoint.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1300
        myCode := self encode:uniCodePoint.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1301
        myCode > 16rFF ifTrue:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1302
            myCode > 16rFFFF ifTrue:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1303
                bits < 32 ifTrue:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1304
                    newString := Unicode32String fromString:newString.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1305
                    bits := 32.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1306
                ]
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1307
            ] ifFalse:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1308
                bits < 16 ifTrue:[
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1309
                    newString := Unicode16String fromString:newString.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1310
                    bits := 16.
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1311
                ]
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1312
            ]
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1313
        ].
ba9c6e587973 care for bitsPerCharacter change during encodeString/decodeString.
ca
parents: 8136
diff changeset
  1314
        newString at:idx put:(Character value:myCode).
8118
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1315
    ].
efc99c0f68bc *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8114
diff changeset
  1316
    ^ newString
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1317
! !
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1318
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1319
!CharacterEncoder methodsFor:'error handling'!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1320
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1321
decodingError 
7904
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1322
    "report an error that there is no unicode-codePoint for a given codePoint in this encoding.
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1323
     (which is unlikely) or that the encoding is undefined for that value
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1324
     (for example, holes in the ISO8859-3 encoding)"
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1325
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1326
    |badCodePoint sender|
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1327
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1328
    sender := thisContext sender.
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1329
    ((sender selector == #encode:) or:[sender selector == #decode:]) ifFalse:[
11295
94171df45ac5 encoding error handling
Claus Gittinger <cg@exept.de>
parents: 11262
diff changeset
  1330
        badCodePoint := sender methodHome argAt:1
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1331
    ].
11295
94171df45ac5 encoding error handling
Claus Gittinger <cg@exept.de>
parents: 11262
diff changeset
  1332
    ^ (DecodingError new)
94171df45ac5 encoding error handling
Claus Gittinger <cg@exept.de>
parents: 11262
diff changeset
  1333
        defaultValue:(self defaultDecoderValue);
94171df45ac5 encoding error handling
Claus Gittinger <cg@exept.de>
parents: 11262
diff changeset
  1334
        parameter:badCodePoint;
94171df45ac5 encoding error handling
Claus Gittinger <cg@exept.de>
parents: 11262
diff changeset
  1335
        messageText:'invalid code'; 
94171df45ac5 encoding error handling
Claus Gittinger <cg@exept.de>
parents: 11262
diff changeset
  1336
        suspendedContext:sender;
94171df45ac5 encoding error handling
Claus Gittinger <cg@exept.de>
parents: 11262
diff changeset
  1337
        raiseRequest.
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1338
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1339
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1340
defaultDecoderValue
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1341
    "placed into a decoded string, in case there is no unicode codePoint
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1342
     for a given encoded codePoint.
7904
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1343
     (typically 16rFFFF)."
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1344
    
7904
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1345
    ^ 16rFFFF
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1346
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1347
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1348
defaultEncoderValue
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1349
    "placed into an encoded string, in case there is no codePoint
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1350
     for a given unicode codePoint.
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1351
     (typically $?)."
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1352
8101
f7023a4735bf Use the ANSI-blessed #codePoint instead of deprecated #asciiValue
Stefan Vogel <sv@exept.de>
parents: 8087
diff changeset
  1353
    ^ $? codePoint
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1354
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1355
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1356
encodingError
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1357
    "report an error that some unicode-codePoint cannot be represented by this encoder"
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1358
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1359
    |badCodePoint sender|
7904
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1360
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1361
    sender := thisContext sender.
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1362
    ((sender selector == #encode:) or:[sender selector == #decode:]) ifFalse:[
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1363
        badCodePoint := sender methodHome argAt:1
7904
e3940bba2746 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7903
diff changeset
  1364
    ].
8048
293c8178c6eb utf8 errors
Claus Gittinger <cg@exept.de>
parents: 8033
diff changeset
  1365
    ^ (EncodingError new)
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1366
        defaultValue:(self defaultEncoderValue);
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1367
        parameter:badCodePoint;
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1368
        messageText:'unrepresentable code (some character cannot be represented)'; 
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1369
        suspendedContext:sender;
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1370
        raiseRequest
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1371
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1372
    "Modified: / 12-07-2012 / 20:36:37 / cg"
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1373
! !
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1374
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1375
!CharacterEncoder methodsFor:'printing'!
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1376
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1377
printOn:aStream
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1378
    aStream 
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1379
        nextPutAll:(self nameOfDecodedCode);
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1380
        nextPutAll:'->';
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1381
        nextPutAll:(self nameOfEncoding)
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1382
! !
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1383
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1384
!CharacterEncoder methodsFor:'private'!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1385
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1386
newString:size
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1387
    self subclassResponsibility
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1388
! !
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1389
7917
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1390
!CharacterEncoder methodsFor:'queries'!
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1391
11975
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1392
characterSize:codePoint
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1393
    "return the number of bytes required to encode codePoint"
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1394
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1395
    ^ self subclassResponsibility
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1396
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1397
    "Created: / 15-06-2005 / 15:11:04 / janfrog"
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1398
!
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1399
7917
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1400
isNullEncoder
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1401
    ^ false
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1402
!
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1403
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1404
nameOfDecodedCode
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1405
    "Most coders decode from their code into unicode / encode from unicode into their code.
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1406
     There are a few exceptions to this, though - these must redefine this."
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1407
    
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1408
    ^ self class nameOfDecodedCode
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1409
!
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1410
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1411
nameOfEncoding
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1412
    ^ self class nameOfEncoding
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1413
!
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1414
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1415
userFriendlyNameOfEncoding
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1416
    ^ self class userFriendlyNameOfEncoding
7917
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1417
! !
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1418
11975
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1419
!CharacterEncoder methodsFor:'stream support'!
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1420
12608
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1421
readNext:charactersToRead charactersFrom:stream 
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1422
    ^ self decodeString:(stream next:charactersToRead)
11975
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1423
!
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1424
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1425
readNextCharacterFrom:aStream
12608
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1426
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1427
    | c |
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1428
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1429
    c := aStream next.
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1430
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1431
    ^ c isNil 
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1432
        ifTrue: [nil]
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1433
        ifFalse: [(self decode:c asInteger) asCharacter]
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1434
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1435
    "Created: / 14-06-2005 / 17:03:21 / janfrog"
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1436
    "Modified: / 15-06-2005 / 15:27:49 / janfrog"
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1437
    "Modified: / 20-06-2005 / 13:13:52 / masca"
12435
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1438
!
11975
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1439
12435
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1440
readNextInputCharacterFrom:aStream
12608
Claus Gittinger <cg@exept.de>
parents: 12435
diff changeset
  1441
    ^ aStream next
11975
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1442
! !
7b37b4dbd66f *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 11374
diff changeset
  1443
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1444
!CharacterEncoder::CompoundEncoder class methodsFor:'documentation'!
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1445
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1446
documentation
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1447
"
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1448
    A compoundEncoder uses two real encoders;
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1449
    to encode:
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1450
        string -> decoder(encode) -> encoder -> result
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1451
    to decode:
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1452
        string -> encoder -> decoder -> result
7956
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1453
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1454
    |e|
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1455
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1456
    e := CompoundEncoder new.
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1457
    e encoder:ISO8859_5 decoder:KOI8_R.
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1458
    e decode:16rB0.  'CYRILLIC CAPITAL LETTER A; 16rB0 in 8859-5; 16rE1 in KOI8-R'.
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1459
    e encode:16rE1.  
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1460
"
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1461
! !
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1462
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1463
!CharacterEncoder::CompoundEncoder methodsFor:'accessing'!
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1464
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1465
encoder:encoderArg decoder:decoderArg  
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1466
    "set instance variables (automatically generated)"
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1467
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1468
    decoder := decoderArg.
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1469
    encoder := encoderArg.
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1470
! !
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1471
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1472
!CharacterEncoder::CompoundEncoder methodsFor:'encoding & decoding'!
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1473
7956
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1474
decode:aCode
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1475
    ^ decoder encode:(encoder decode:aCode)
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1476
!
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1477
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1478
decodeString:aString
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1479
    ^ decoder encodeString:(encoder decodeString:aString)
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1480
!
c43ee9e00bab *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7948
diff changeset
  1481
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1482
encode:aCode
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1483
    ^ encoder encode:(decoder decode:aCode)
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1484
!
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1485
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1486
encodeString:aString
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1487
    ^ encoder encodeString:(decoder decodeString:aString)
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1488
! !
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1489
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1490
!CharacterEncoder::CompoundEncoder methodsFor:'printing'!
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1491
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1492
printOn:aStream
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1493
    aStream 
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1494
        nextPutAll:(decoder nameOfEncoding);
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1495
        nextPutAll:'->'.
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1496
"/        nextPutAll:(decoder nameOfDecodedCode);
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1497
"/        nextPutAll:'->';
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1498
"/        nextPutAll:(encoder nameOfEncoding)
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1499
    encoder printOn:aStream
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1500
! !
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1501
7932
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
  1502
!CharacterEncoder::DefaultEncoder class methodsFor:'documentation'!
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
  1503
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
  1504
documentation
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
  1505
"
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1506
    That is only a dummy for ST80 compatibility
7932
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
  1507
"
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
  1508
! !
ee233bf44df5 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7924
diff changeset
  1509
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1510
!CharacterEncoder::InverseEncoder class methodsFor:'documentation'!
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1511
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1512
documentation
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1513
"
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1514
    An inverseEncoder does the inverse - i.e. encode is really a decode
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1515
    and decode is really an encode.
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1516
"
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1517
! !
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1518
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1519
!CharacterEncoder::InverseEncoder methodsFor:'accessing'!
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1520
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1521
decoder:something
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1522
    decoder := something.
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1523
! !
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1524
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1525
!CharacterEncoder::InverseEncoder methodsFor:'encoding & decoding'!
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1526
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1527
decode:aCode
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1528
    ^ decoder encode:aCode
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1529
!
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1530
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1531
decodeString:aString
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1532
    ^ decoder encodeString:aString
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1533
!
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1534
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1535
encode:aCode
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1536
    ^ decoder decode:aCode
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1537
!
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1538
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1539
encodeString:aString
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1540
    ^ decoder decodeString:aString
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1541
! !
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1542
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1543
!CharacterEncoder::InverseEncoder methodsFor:'printing'!
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1544
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1545
printOn:aStream
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1546
    aStream 
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1547
        nextPutAll:(decoder nameOfEncoding);
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1548
        nextPutAll:'->';
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1549
        nextPutAll:(decoder nameOfDecodedCode)
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1550
! !
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1551
12435
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1552
!CharacterEncoder::InverseEncoder methodsFor:'queries'!
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1553
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1554
characterSize:charOrcodePoint
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1555
    ^ decoder characterSize:charOrcodePoint
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1556
! !
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1557
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1558
!CharacterEncoder::InverseEncoder methodsFor:'stream support'!
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1559
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1560
readNextInputCharacterFrom:aStream
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1561
    ^ decoder readNextInputCharacterFrom:aStream
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1562
! !
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1563
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1564
!CharacterEncoder::NullEncoder class methodsFor:'documentation'!
7914
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1565
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1566
documentation
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1567
"
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1568
    A NullEncoder does nothing.
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1569
"
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1570
! !
86a3784b40dd *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7913
diff changeset
  1571
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1572
!CharacterEncoder::NullEncoder methodsFor:'encoding & decoding'!
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1573
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1574
decode:aCode
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1575
    ^ aCode
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1576
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1577
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1578
decodeString:aString
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1579
    ^ aString
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1580
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1581
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1582
encode:aCode
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1583
    ^ aCode
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1584
!
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1585
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1586
encodeString:aString
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1587
    ^ aString
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1588
! !
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1589
7917
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1590
!CharacterEncoder::NullEncoder methodsFor:'queries'!
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1591
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1592
isNullEncoder
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1593
    ^ true
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1594
! !
3649394bf5c0 checkin from browser
Claus Gittinger <cg@exept.de>
parents: 7915
diff changeset
  1595
7915
0b92b16542f6 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7914
diff changeset
  1596
!CharacterEncoder::OtherEncoding class methodsFor:'private'!
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1597
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1598
flushCode
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1599
!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1600
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1601
generateEncoderCode
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1602
! !
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1603
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1604
!CharacterEncoder::TwoStepEncoder class methodsFor:'documentation'!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1605
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1606
documentation
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1607
"
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1608
    A twoStepEncoder uses two real encoders;
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1609
    to encode:
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1610
        string -> encoder1(encode) -> encoder2(encode) -> result
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1611
    to decode:
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1612
        string -> encoder2(decode) -> encoder1(decode) -> result
7919
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1613
"
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1614
! !
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1615
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1616
!CharacterEncoder::TwoStepEncoder methodsFor:'accessing'!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1617
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1618
encoder1:encoder1Arg encoder2:encoder2Arg
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1619
    "set instance variables (automatically generated)"
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1620
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1621
    encoder1 := encoder1Arg.
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1622
    encoder2 := encoder2Arg.
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1623
! !
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1624
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1625
!CharacterEncoder::TwoStepEncoder methodsFor:'encoding & decoding'!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1626
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1627
decode:aCode
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1628
    ^ encoder1 decode:(encoder2 decode:aCode)
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1629
!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1630
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1631
decodeString:aString
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1632
    ^ encoder1 decodeString:(encoder2 decodeString:aString)
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1633
!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1634
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1635
encode:aCode
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1636
    ^ encoder2 encode:(encoder1 encode:aCode)
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1637
!
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1638
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1639
encodeString:aString
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1640
    ^ encoder2 encodeString:(encoder1 encodeString:aString)
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1641
! !
92b61bef1b1a *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7917
diff changeset
  1642
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1643
!CharacterEncoder::TwoStepEncoder methodsFor:'printing'!
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1644
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1645
printOn:aStream
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1646
    aStream 
14209
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1647
        nextPutAll:(encoder1 nameOfDecodedCode);
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1648
        nextPutAll:'->';
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1649
        nextPutAll:(encoder1 nameOfEncoding);
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1650
        nextPutAll:'->';
912e4845d386 changed: #encodingError
Claus Gittinger <cg@exept.de>
parents: 14207
diff changeset
  1651
        nextPutAll:(encoder2 nameOfEncoding)
7972
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1652
! !
91aa73f89491 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 7971
diff changeset
  1653
11300
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1654
!CharacterEncoder::TwoStepEncoder methodsFor:'queries'!
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1655
14523
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
  1656
characterSize:codePoint
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
  1657
    "/ naive; actually, we have to do a real encoding to get this info proper
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
  1658
    ^ (encoder2 characterSize:codePoint)
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
  1659
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
  1660
    "Created: / 22-11-2012 / 13:07:47 / cg"
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
  1661
!
91746a24d5ad characterSize: query was missing
Claus Gittinger <cg@exept.de>
parents: 14209
diff changeset
  1662
11300
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1663
nameOfEncoding
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1664
    ^ "encoder1 nameOfEncoding , '-' ," encoder2 nameOfEncoding
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1665
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1666
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1667
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1668
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1669
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1670
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1671
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1672
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1673
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1674
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1675
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1676
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1677
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1678
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1679
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1680
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1681
! !
2e90a91ff766 nameOf two-step encoder
Claus Gittinger <cg@exept.de>
parents: 11295
diff changeset
  1682
7892
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1683
!CharacterEncoder class methodsFor:'documentation'!
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1684
149a145e871c initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
  1685
version
15966
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
  1686
    ^ '$Header: /cvs/stx/stx/libbasic/CharacterEncoder.st,v 1.124 2014-02-05 17:19:50 cg Exp $'
12435
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1687
!
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1688
539c24148e90 added: #readNextInputCharacterFrom:
Claus Gittinger <cg@exept.de>
parents: 11975
diff changeset
  1689
version_CVS
15966
72f3e3a9ba29 class: CharacterEncoder
Claus Gittinger <cg@exept.de>
parents: 15609
diff changeset
  1690
    ^ '$Header: /cvs/stx/stx/libbasic/CharacterEncoder.st,v 1.124 2014-02-05 17:19:50 cg Exp $'
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1691
! !
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1692
14777
a669080229da add user friendly name to semaphores
Stefan Vogel <sv@exept.de>
parents: 14559
diff changeset
  1693
7899
7577df77ba95 character encodings - first attempt
Claus Gittinger <cg@exept.de>
parents: 7893
diff changeset
  1694
CharacterEncoder initialize!