Character.st
branchjv
changeset 20131 4118d61ddba0
parent 20079 8d884971c2ed
parent 20111 5ae345836020
child 21024 8734987eb5c7
--- a/Character.st	Wed Jul 06 06:50:27 2016 +0200
+++ b/Character.st	Sat Jul 09 21:10:24 2016 +0100
@@ -62,15 +62,15 @@
     Always compare using #= if there is any chance of a non-ascii character being involved.
 
     Once again (because beginners sometimes make this mistake):
-	This means: you may compare characters using #== ONLY IFF you are certain,
-	that the characters ranges is 0..255.
-	Otherwise, you HAVE TO compare using #=. (if in doubt, always compare using #=).
-	Sorry for this inconvenience, but it is (practically) impossible to keep
-	the possible maximum of 2^32 characters (Unicode) around, for that convenience alone.
+        This means: you may compare characters using #== ONLY IFF you are certain,
+        that the characters ranges is 0..255.
+        Otherwise, you HAVE TO compare using #=. (if in doubt, always compare using #=).
+        Sorry for this inconvenience, but it is (practically) impossible to keep
+        the possible maximum of 2^32 characters (Unicode) around, for that convenience alone.
 
     In ST/X, N is (currently) 1024. This means that all the latin characters and some others are
     kept as singleton in the CharacterTable class variable (which is also used by the VM when characters
-    are instanciated).
+    are instantiated).
 
     Methods marked as (JS) come from the manchester Character goody
     (CharacterComparing) by Jan Steinman, which allow Characters to be used as
@@ -79,7 +79,7 @@
     Some of these have been modified a bit.
 
     WARNING: characters are known by compiler and runtime system -
-	     do not change the instance layout.
+             do not change the instance layout.
 
     Also, although you can create subclasses of Character, the compiler always
     creates instances of Character for literals ...
@@ -88,43 +88,43 @@
     Therefore, it may not make sense to create a character-subclass.
 
     Case Mapping in Unicode:
-	There are a number of complications to case mappings that occur once the repertoire
-	of characters is expanded beyond ASCII.
-
-	* Because of the inclusion of certain composite characters for compatibility,
-	  such as U+01F1 'DZ' capital dz, there is a third case, called titlecase,
-	  which is used where the first letter of a word is to be capitalized
-	  (e.g. Titlecase, vs. UPPERCASE, or lowercase).
-	  For example, the title case of the example character is U+01F2 'Dz' capital d with small z.
-
-	* Case mappings may produce strings of different length than the original.
-	  For example, the German character U+00DF small letter sharp s expands when uppercased to
-	  the sequence of two characters 'SS'.
-	  This also occurs where there is no precomposed character corresponding to a case mapping.
-	  *** This is not yet implemented (in 5.2) ***
-
-	* Characters may also have different case mappings, depending on the context.
-	  For example, U+03A3 capital sigma lowercases to U+03C3 small sigma if it is not followed
-	  by another letter, but lowercases to 03C2 small final sigma if it is.
-	  *** This is not yet implemented (in 5.2) ***
-
-	* Characters may have case mappings that depend on the locale.
-	  For example, in Turkish the letter 0049 'I' capital letter i lowercases to 0131 small dotless i.
-	  *** This is not yet implemented (in 5.2) ***
-
-	* Case mappings are not, in general, reversible.
-	  For example, once the string 'McGowan' has been uppercased, lowercased or titlecased,
-	  the original cannot be recovered by applying another uppercase, lowercase, or titlecase operation.
+        There are a number of complications to case mappings that occur once the repertoire
+        of characters is expanded beyond ASCII.
+
+        * Because of the inclusion of certain composite characters for compatibility,
+          such as U+01F1 'DZ' capital dz, there is a third case, called titlecase,
+          which is used where the first letter of a word is to be capitalized
+          (e.g. Titlecase, vs. UPPERCASE, or lowercase).
+          For example, the title case of the example character is U+01F2 'Dz' capital d with small z.
+
+        * Case mappings may produce strings of different length than the original.
+          For example, the German character U+00DF small letter sharp s expands when uppercased to
+          the sequence of two characters 'SS'.
+          This also occurs where there is no precomposed character corresponding to a case mapping.
+          *** This is not yet implemented (in 5.2) ***
+
+        * Characters may also have different case mappings, depending on the context.
+          For example, U+03A3 capital sigma lowercases to U+03C3 small sigma if it is not followed
+          by another letter, but lowercases to 03C2 small final sigma if it is.
+          *** This is not yet implemented (in 5.2) ***
+
+        * Characters may have case mappings that depend on the locale.
+          For example, in Turkish the letter 0049 'I' capital letter i lowercases to 0131 small dotless i.
+          *** This is not yet implemented (in 5.2) ***
+
+        * Case mappings are not, in general, reversible.
+          For example, once the string 'McGowan' has been uppercased, lowercased or titlecased,
+          the original cannot be recovered by applying another uppercase, lowercase, or titlecase operation.
 
     Collation Sequence:
-	*** This is not yet implemented (in 5.2) ***
+        *** This is not yet implemented (in 5.2) ***
 
     [author:]
-	Claus Gittinger
+        Claus Gittinger
 
     [see also:]
-	String TwoByteString Unicode16String Unicode32String
-	StringCollection Text
+        String TwoByteString Unicode16String Unicode32String
+        StringCollection Text
 "
 ! !
 
@@ -601,6 +601,7 @@
     "
 ! !
 
+
 !Character methodsFor:'Compatibility-Dolphin'!
 
 isAlphaNumeric
@@ -648,6 +649,8 @@
       or:[ (asciivalue == 247 ) ]]]]]
 ! !
 
+
+
 !Character methodsFor:'accessing'!
 
 codePoint
@@ -994,9 +997,7 @@
     // comon ascii stuff first
     if (__codePoint < 0x80) {
         if ((__codePoint >= 'A') && (__codePoint <= 'Z')) {
-            unsigned newCodePoint;
-
-            newCodePoint = __codePoint - 'A' + 'a';
+            unsigned int newCodePoint = __codePoint - 'A' + 'a';
             RETURN (__MKCHARACTER(newCodePoint)) ;
         }
         RETURN (self);