--- a/Character.st Wed Jul 06 15:28:46 2016 +0200
+++ b/Character.st Wed Jul 06 16:38:45 2016 +0200
@@ -1,5 +1,3 @@
-"{ Encoding: utf8 }"
-
"
COPYRIGHT (c) 1988 by Claus Gittinger
All Rights Reserved
@@ -62,15 +60,15 @@
Always compare using #= if there is any chance of a non-ascii character being involved.
Once again (because beginners sometimes make this mistake):
- This means: you may compare characters using #== ONLY IFF you are certain,
- that the characters ranges is 0..255.
- Otherwise, you HAVE TO compare using #=. (if in doubt, always compare using #=).
- Sorry for this inconvenience, but it is (practically) impossible to keep
- the possible maximum of 2^32 characters (Unicode) around, for that convenience alone.
+ This means: you may compare characters using #== ONLY IFF you are certain,
+ that the characters ranges is 0..255.
+ Otherwise, you HAVE TO compare using #=. (if in doubt, always compare using #=).
+ Sorry for this inconvenience, but it is (practically) impossible to keep
+ the possible maximum of 2^32 characters (Unicode) around, for that convenience alone.
In ST/X, N is (currently) 1024. This means that all the latin characters and some others are
kept as singleton in the CharacterTable class variable (which is also used by the VM when characters
- are instanciated).
+ are instantiated).
Methods marked as (JS) come from the manchester Character goody
(CharacterComparing) by Jan Steinman, which allow Characters to be used as
@@ -79,7 +77,7 @@
Some of these have been modified a bit.
WARNING: characters are known by compiler and runtime system -
- do not change the instance layout.
+ do not change the instance layout.
Also, although you can create subclasses of Character, the compiler always
creates instances of Character for literals ...
@@ -88,43 +86,43 @@
Therefore, it may not make sense to create a character-subclass.
Case Mapping in Unicode:
- There are a number of complications to case mappings that occur once the repertoire
- of characters is expanded beyond ASCII.
-
- * Because of the inclusion of certain composite characters for compatibility,
- such as U+01F1 'DZ' capital dz, there is a third case, called titlecase,
- which is used where the first letter of a word is to be capitalized
- (e.g. Titlecase, vs. UPPERCASE, or lowercase).
- For example, the title case of the example character is U+01F2 'Dz' capital d with small z.
-
- * Case mappings may produce strings of different length than the original.
- For example, the German character U+00DF small letter sharp s expands when uppercased to
- the sequence of two characters 'SS'.
- This also occurs where there is no precomposed character corresponding to a case mapping.
- *** This is not yet implemented (in 5.2) ***
-
- * Characters may also have different case mappings, depending on the context.
- For example, U+03A3 capital sigma lowercases to U+03C3 small sigma if it is not followed
- by another letter, but lowercases to 03C2 small final sigma if it is.
- *** This is not yet implemented (in 5.2) ***
-
- * Characters may have case mappings that depend on the locale.
- For example, in Turkish the letter 0049 'I' capital letter i lowercases to 0131 small dotless i.
- *** This is not yet implemented (in 5.2) ***
-
- * Case mappings are not, in general, reversible.
- For example, once the string 'McGowan' has been uppercased, lowercased or titlecased,
- the original cannot be recovered by applying another uppercase, lowercase, or titlecase operation.
+ There are a number of complications to case mappings that occur once the repertoire
+ of characters is expanded beyond ASCII.
+
+ * Because of the inclusion of certain composite characters for compatibility,
+ such as U+01F1 'DZ' capital dz, there is a third case, called titlecase,
+ which is used where the first letter of a word is to be capitalized
+ (e.g. Titlecase, vs. UPPERCASE, or lowercase).
+ For example, the title case of the example character is U+01F2 'Dz' capital d with small z.
+
+ * Case mappings may produce strings of different length than the original.
+ For example, the German character U+00DF small letter sharp s expands when uppercased to
+ the sequence of two characters 'SS'.
+ This also occurs where there is no precomposed character corresponding to a case mapping.
+ *** This is not yet implemented (in 5.2) ***
+
+ * Characters may also have different case mappings, depending on the context.
+ For example, U+03A3 capital sigma lowercases to U+03C3 small sigma if it is not followed
+ by another letter, but lowercases to 03C2 small final sigma if it is.
+ *** This is not yet implemented (in 5.2) ***
+
+ * Characters may have case mappings that depend on the locale.
+ For example, in Turkish the letter 0049 'I' capital letter i lowercases to 0131 small dotless i.
+ *** This is not yet implemented (in 5.2) ***
+
+ * Case mappings are not, in general, reversible.
+ For example, once the string 'McGowan' has been uppercased, lowercased or titlecased,
+ the original cannot be recovered by applying another uppercase, lowercase, or titlecase operation.
Collation Sequence:
- *** This is not yet implemented (in 5.2) ***
+ *** This is not yet implemented (in 5.2) ***
[author:]
- Claus Gittinger
+ Claus Gittinger
[see also:]
- String TwoByteString Unicode16String Unicode32String
- StringCollection Text
+ String TwoByteString Unicode16String Unicode32String
+ StringCollection Text
"
! !
@@ -313,6 +311,7 @@
^ self codePoint:anInteger
! !
+
!Character class methodsFor:'accessing untypeable characters'!
controlCharacter:char
@@ -359,6 +358,7 @@
^ self codePoint:41
! !
+
!Character class methodsFor:'constants'!
backspace
@@ -601,6 +601,9 @@
"
! !
+
+
+
!Character methodsFor:'Compatibility-Dolphin'!
isAlphaNumeric
@@ -648,6 +651,8 @@
or:[ (asciivalue == 247 ) ]]]]]
! !
+
+
!Character methodsFor:'accessing'!
codePoint
@@ -1498,7 +1503,7 @@
^ s contents
"
- 'ä' utf8Encoded
+ 'ä' utf8Encoded
'a' utf8Encoded
"
! !
@@ -2637,9 +2642,9 @@
"
$e asNonDiacritical
- $é asNonDiacritical
- $ä asNonDiacritical
- $Ã¥ asNonDiacritical
+ $é asNonDiacritical
+ $ä asNonDiacritical
+ $å asNonDiacritical
"
!