Faculty of Information Technology
Software Engineering Group

Opened 15 months ago

Last modified 11 months ago

#239 testing defect

Fix all Smalltak/X source files to be in unicode (UTF8)

Reported by: Patrik Svestka Owned by:
Priority: major Milestone:
Component: default Keywords:
Cc: Also affects CVS HEAD (eXept version): no

Description

It appears there are some encoding issues with the Smalltalk/X source files. All the files that are stored at mercurial should be converted to unicode version. The files in CVS are not to be adjusted.

Attachments (7)

fixing_encoding_and_adding_encoding_directive.ps1 (3.4 KB) - added by Patrik Svestka 15 months ago.
powershell script that fixes all the encoding in all files
goodies_unicode_conversion.log (291.5 KB) - added by Patrik Svestka 15 months ago.
log file form 'goodies' directory
stx_unicode_conversion.log (224.1 KB) - added by Patrik Svestka 15 months ago.
log file from 'stx' directory (except for 'goodies' directory)
patches_for_[#239]_Encoding.7z (103.8 KB) - added by Patrik Svestka 14 months ago.
Encoding patches for Mercurial based repositories
unicode_patches_20181115.7z (67.8 KB) - added by Patrik Svestka 12 months ago.
Re-patch of all files
encoding.st (4.2 KB) - added by Patrik Svestka 12 months ago.
Smalltalk script used for repatching (some minor manual intervetion was still needed)
unicode_re-patches_20181115.7z (83.1 KB) - added by Patrik Svestka 12 months ago.
Re-patch of all files

Download all attachments as: .zip

Change History (15)

comment:1 Changed 15 months ago by Patrik Svestka

I have created a powershell script which does all the conversion automagically :). I'm adding it to the ticket.

It has to be run twice. Once in C:\prg_sdk\stx8-jv_swing\build\stx and second in C:\prg_sdk\stx8-jv_swing\build\stx\goodies.

Changed 15 months ago by Patrik Svestka

powershell script that fixes all the encoding in all files

Changed 15 months ago by Patrik Svestka

log file form 'goodies' directory

Changed 15 months ago by Patrik Svestka

Attachment: stx_unicode_conversion.log added

log file from 'stx' directory (except for 'goodies' directory)

comment:2 Changed 15 months ago by Patrik Svestka

Sigh, the re-encoding script is not working properly for all cases. The powershel Get-Content detects encoding automatically, but incorrectly if the source encoding is UTF8 without BOM.

If load such file and try to save it the encoding characters are damaged. I'll try to figure out a solution for that.

comment:3 Changed 14 months ago by Patrik Svestka

I have found a solution. I have taken the orinal files and some files with force -UTF8 reading encoding and compred them manually via diff utility.

I have addressed all issues and tried to run the tests and everything that passed before passes now too.

Please see all the patches at patches_for_[#239]_Encoding.7z

comment:4 Changed 14 months ago by Patrik Svestka

Status: newtesting

Changed 14 months ago by Patrik Svestka

Encoding patches for Mercurial based repositories

comment:5 Changed 12 months ago by Patrik Svestka

Based on our discussion I have made changes to the patches above (via the encoding.st script (executed via smalltalk.bat --execute c:\<path>\stx\encoding.st).

These changes are mainly:

  • only files that contain an above ascii character have the header started with: "{ Encoding: utf8 }"
  • all files now have Unix EOL (String lf) - some files did have mixed encoding (String crlf and String lf) some even had mac encoding (String cr).

I'm attaching new patches - I have done rebase to those patches that I've already received.

Changed 12 months ago by Patrik Svestka

Attachment: unicode_patches_20181115.7z added

Re-patch of all files

Changed 12 months ago by Patrik Svestka

Attachment: encoding.st added

Smalltalk script used for repatching (some minor manual intervetion was still needed)

comment:6 Changed 12 months ago by Patrik Svestka

Sigh, I have made silly mistake with mercurial rebase. I'm republishing the file unicode_re-patches_20181115.7z which should have it fixed.

Changed 12 months ago by Patrik Svestka

Re-patch of all files

comment:7 Changed 12 months ago by Patrik Svestka

You may notice that some of the files appear not to have any changes. These files had issues with the line-ends. There were cases were the ends were only CR, also CRLF or mixed.

I have unified all EOL to LF only.

Last edited 12 months ago by Patrik Svestka (previous) (diff)

comment:8 Changed 11 months ago by Patrik Svestka

Jan, probably the patches in the goodies directory are missing from the patches you have applied.

Note: See TracTickets for help on using tickets.