Faculty of Information Technology
Software Engineering Group

Opened 5 years ago

Last modified 4 years ago

#239 testing defect

Fix all Smalltak/X source files to be in unicode (UTF8)

Reported by: Patrik Svestka Owned by:
Priority: major Milestone:
Component: default Keywords:
Cc: Also affects CVS HEAD (eXept version): no

Description

It appears there are some encoding issues with the Smalltalk/X source files. All the files that are stored at mercurial should be converted to unicode version. The files in CVS are not to be adjusted.

Attachments (7)

fixing_encoding_and_adding_encoding_directive.ps1 (3.4 KB) - added by Patrik Svestka 5 years ago.
powershell script that fixes all the encoding in all files
goodies_unicode_conversion.log (291.5 KB) - added by Patrik Svestka 5 years ago.
log file form 'goodies' directory
stx_unicode_conversion.log (224.1 KB) - added by Patrik Svestka 5 years ago.
log file from 'stx' directory (except for 'goodies' directory)
patches_for_[#239]_Encoding.7z (103.8 KB) - added by Patrik Svestka 5 years ago.
Encoding patches for Mercurial based repositories
unicode_patches_20181115.7z (67.8 KB) - added by Patrik Svestka 5 years ago.
Re-patch of all files
encoding.st (4.2 KB) - added by Patrik Svestka 5 years ago.
Smalltalk script used for repatching (some minor manual intervetion was still needed)
unicode_re-patches_20181115.7z (83.1 KB) - added by Patrik Svestka 5 years ago.
Re-patch of all files

Download all attachments as: .zip

Change History (15)

comment:1 Changed 5 years ago by Patrik Svestka

I have created a powershell script which does all the conversion automagically :). I'm adding it to the ticket.

It has to be run twice. Once in C:\prg_sdk\stx8-jv_swing\build\stx and second in C:\prg_sdk\stx8-jv_swing\build\stx\goodies.

Changed 5 years ago by Patrik Svestka

powershell script that fixes all the encoding in all files

Changed 5 years ago by Patrik Svestka

log file form 'goodies' directory

Changed 5 years ago by Patrik Svestka

Attachment: stx_unicode_conversion.log added

log file from 'stx' directory (except for 'goodies' directory)

comment:2 Changed 5 years ago by Patrik Svestka

Sigh, the re-encoding script is not working properly for all cases. The powershel Get-Content detects encoding automatically, but incorrectly if the source encoding is UTF8 without BOM.

If load such file and try to save it the encoding characters are damaged. I'll try to figure out a solution for that.

comment:3 Changed 5 years ago by Patrik Svestka

I have found a solution. I have taken the orinal files and some files with force -UTF8 reading encoding and compred them manually via diff utility.

I have addressed all issues and tried to run the tests and everything that passed before passes now too.

Please see all the patches at patches_for_[#239]_Encoding.7z

comment:4 Changed 5 years ago by Patrik Svestka

Status: newtesting

Changed 5 years ago by Patrik Svestka

Encoding patches for Mercurial based repositories

comment:5 Changed 5 years ago by Patrik Svestka

Based on our discussion I have made changes to the patches above (via the encoding.st script (executed via smalltalk.bat --execute c:\<path>\stx\encoding.st).

These changes are mainly:

  • only files that contain an above ascii character have the header started with: "{ Encoding: utf8 }"
  • all files now have Unix EOL (String lf) - some files did have mixed encoding (String crlf and String lf) some even had mac encoding (String cr).

I'm attaching new patches - I have done rebase to those patches that I've already received.

Changed 5 years ago by Patrik Svestka

Attachment: unicode_patches_20181115.7z added

Re-patch of all files

Changed 5 years ago by Patrik Svestka

Attachment: encoding.st added

Smalltalk script used for repatching (some minor manual intervetion was still needed)

comment:6 Changed 5 years ago by Patrik Svestka

Sigh, I have made silly mistake with mercurial rebase. I'm republishing the file unicode_re-patches_20181115.7z which should have it fixed.

Changed 5 years ago by Patrik Svestka

Re-patch of all files

comment:7 Changed 5 years ago by Patrik Svestka

You may notice that some of the files appear not to have any changes. These files had issues with the line-ends. There were cases were the ends were only CR, also CRLF or mixed.

I have unified all EOL to LF only.

Last edited 5 years ago by Patrik Svestka (previous) (diff)

comment:8 Changed 4 years ago by Patrik Svestka

Jan, probably the patches in the goodies directory are missing from the patches you have applied.

Note: See TracTickets for help on using tickets.