CharacterEncoderImplementations__ISO10646_to_SGML.st
author Stefan Vogel <sv@exept.de>
Fri, 27 Oct 2017 16:14:37 +0200
branchexpecco_2_11_1_branch
changeset 22329 20662662693b
parent 10108 8e610353f2fa
child 17711 39faaaf888b4
child 22477 5b8c1f5f8ffa
permissions -rw-r--r--
Add 2.11.0 Patch
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
8171
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     1
"
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     2
 COPYRIGHT (c) 2004 by eXept Software AG
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     3
              All Rights Reserved
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     4
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     5
 This software is furnished under a license and may be used
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     6
 only in accordance with the terms of that license and with the
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     7
 inclusion of the above copyright notice.   This software may not
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     8
 be provided or otherwise made available to, or used by, any
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
     9
 other person.  No title to or ownership of the software is
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    10
 hereby transferred.
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    11
"
8170
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    12
"{ Package: 'stx:libbasic' }"
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    13
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    14
"{ NameSpace: CharacterEncoderImplementations }"
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    15
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    16
TwoByteEncoder subclass:#ISO10646_to_SGML
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    17
	instanceVariableNames:''
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    18
	classVariableNames:''
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    19
	poolDictionaries:''
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    20
	category:'Collections-Text-Encodings'
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    21
!
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    22
8171
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    23
!ISO10646_to_SGML class methodsFor:'documentation'!
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    24
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    25
copyright
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    26
"
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    27
 COPYRIGHT (c) 2004 by eXept Software AG
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    28
              All Rights Reserved
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    29
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    30
 This software is furnished under a license and may be used
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    31
 only in accordance with the terms of that license and with the
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    32
 inclusion of the above copyright notice.   This software may not
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    33
 be provided or otherwise made available to, or used by, any
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    34
 other person.  No title to or ownership of the software is
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    35
 hereby transferred.
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    36
"
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    37
!
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    38
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    39
documentation
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    40
"
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    41
    Incomplete - only knows how to encode/decode escaped decimal-code characters
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    42
    (i.e. &#nnnn; )
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    43
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    44
    TODO:
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    45
        add all other characters
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    46
        reuse this code in XML and HTML processing code.
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    47
"
ac837a7ca3a3 *** empty log message ***
Claus Gittinger <cg@exept.de>
parents: 8170
diff changeset
    48
! !
8170
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    49
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    50
!ISO10646_to_SGML methodsFor:'encoding & decoding'!
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    51
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    52
decode:aCode
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    53
    self shouldNotImplement "/ no single byte conversion possible
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    54
!
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    55
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    56
decodeString:aStringOrByteCollection
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    57
    "given a string in SGML encoding (i.e. with SGML escaped characters),
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    58
     return a new string containing the same characters, in 16bit (or more) encoding.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    59
     Returns either a normal String, a TwoByteString or a FourByteString instance.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    60
     Only useful, when reading from external sources.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    61
     This only handles up-to 30bit characters."
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    62
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    63
    |nBits ch 
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    64
     in out codePoint t|
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    65
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    66
    nBits := 8.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    67
    in := aStringOrByteCollection readStream.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    68
    out := WriteStream on:(String new:10).
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    69
    [in atEnd] whileFalse:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    70
        ch := in next.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    71
        ch == $& ifTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    72
            in peekOrNil == $# ifTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    73
                in next.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    74
                codePoint := 0.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    75
                [ch := in peekOrNil.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    76
                 ch notNil and:[ch isDigit]
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    77
                ] whileTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    78
                    codePoint := (codePoint * 10) + ch digitValue.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    79
                    in next.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    80
                ].
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    81
                codePoint > 16rFF ifTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    82
                    codePoint > 16rFFFF ifTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    83
                        nBits < 32 ifTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    84
                            t := out contents.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    85
                            out := WriteStream on:(Unicode32String fromString:t).
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    86
                            out position:t size.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    87
                            nBits := 32.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    88
                        ]
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    89
                    ] ifFalse:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    90
                        nBits < 16 ifTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    91
                            t := out contents.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    92
                            out := WriteStream on:(Unicode16String fromString:t).
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    93
                            out position:t size.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    94
                            nBits := 16.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    95
                        ]
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    96
                    ]
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    97
                ].
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    98
                out nextPut:(Character value:codePoint).
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    99
                in peekOrNil == $; ifTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   100
                    in next.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   101
                ]
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   102
            ] ifFalse:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   103
                out nextPut:ch
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   104
            ]
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   105
        ] ifFalse:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   106
            out nextPut:ch
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   107
        ].
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   108
    ].
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   109
    ^ out contents
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   110
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   111
    "
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   112
     CharacterEncoderImplementations::ISO10646_to_SGML
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   113
        decodeString:'&#1060;&#1072;&#1081;&#1083;' 
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   114
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   115
     CharacterEncoderImplementations::ISO10646_to_SGML
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   116
        decodeString:'#197;&bn...'
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   117
    "
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   118
!
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   119
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   120
encode:aCode
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   121
    self shouldNotImplement "/ no single byte conversion possible
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   122
!
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   123
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   124
encodeString:aUnicodeString
10108
8e610353f2fa comments
Claus Gittinger <cg@exept.de>
parents: 8171
diff changeset
   125
    "return the SGML representation of aUnicodeString.
8170
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   126
     The resulting string is only useful to be stored on some external file,
10108
8e610353f2fa comments
Claus Gittinger <cg@exept.de>
parents: 8171
diff changeset
   127
     not for being used inside ST/X."
8170
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   128
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   129
    |ch in out codePoint|
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   130
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   131
    in := aUnicodeString readStream.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   132
    out := WriteStream on:(String new:10).
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   133
    [in atEnd] whileFalse:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   134
        ch := in next.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   135
        codePoint := ch codePoint.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   136
        (codePoint between:16r20 and:16r7F) ifTrue:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   137
            out nextPut:ch.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   138
        ] ifFalse:[
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   139
            out nextPutAll:'&#'.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   140
            out nextPutAll:(codePoint printString).
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   141
            out nextPutAll:';'.
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   142
        ].
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   143
    ].
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   144
    ^ out contents
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   145
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   146
    "
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   147
     CharacterEncoderImplementations::ISO10646_to_SGML
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   148
        encodeString:'hello äöü' 
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   149
    "
10108
8e610353f2fa comments
Claus Gittinger <cg@exept.de>
parents: 8171
diff changeset
   150
8e610353f2fa comments
Claus Gittinger <cg@exept.de>
parents: 8171
diff changeset
   151
    "Modified: / 23-10-2006 / 13:25:27 / cg"
8170
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   152
! !
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   153
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   154
!ISO10646_to_SGML class methodsFor:'documentation'!
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   155
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   156
version
10108
8e610353f2fa comments
Claus Gittinger <cg@exept.de>
parents: 8171
diff changeset
   157
    ^ '$Header: /cvs/stx/stx/libbasic/CharacterEncoderImplementations__ISO10646_to_SGML.st,v 1.3 2006-10-23 11:25:11 cg Exp $'
8170
ffa1ed9338ad initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   158
! !