Faculty of Information Technology
Software Engineering Group

Opened 10 years ago

Closed 10 years ago

#5 closed defect (fixed)

Wrong parsing of CONSTANT_Utf8_info structure

Reported by: Jan Vraný Owned by: hlopkmar
Priority: major Milestone: milestone:
Component: stx:libjava Keywords: classreader
Cc:

Description (last modified by Jan Vraný)

In Java .class file, the bytes in CONSTANT_Utf8_info structure are not encoded using standard UTF8 encoding. Instead, characters are encoded using modified UTF8 encoding described in section 4.5.7 of The .class File Format spec.

I've changed the JavaClassReader to decode bytes using new method (CharacterData class>>decodeFromJavaUTF8:. However, the decoding method has no support for multi-byte characters. This should be improved.

Note: You may need to convert the string variable to an instance of Unicode16String (Unicode32String repectively) as soon as first character with codePoint > 255 (65535, respectively) is encountered. You may use something like:

string := Unicode16String fromString: string to convert the string

(just a wild guess, you should try it first:-)

Change History (2)

comment:1 Changed 10 years ago by Jan Vraný

Description: modified (diff)
Owner: changed from kursjan to hlopkmar
Status: newassigned

comment:2 Changed 10 years ago by hlopkmar

Resolution: fixed
Status: assignedclosed
CharacterData class>>decodeFromJavaUTF8

now correctly decodes multibyte characters.

Note: See TracTickets for help on using tickets.