You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem:
The program does not detect "windows-1253" encoding. Any text encoded either using "ISO-8859-7" encoding or "windows-1253" encoding is marked as having the "ISO-8859-7" encoding, thus making any reference to "windows-1253" encoding useless.
The only real differences between "ISO-8859-7" and "windows-1253" lay in Character Mapping Table places:
#A2, #B5, #B6
In Character Mapping Table for "ISO-8859-7" the '\u0386'(GREEK CAPITAL LETTER ALPHA WITH TONOS) lays in place 0xB6 while the same letter in Character Mapping Table for "windows-1253" lays in place 0xA2.
In Character Mapping Table for "ISO-8859-7" in place 0xA2 a "90" is used, indicating that '\u2019' (RIGHT SINGLE QUOTATION MARK), which is used in that place in "ISO-8859-7" encoding, is not a punctuation.
How to repeat:
Save a 'utf8' text, written in Greek and containing at least once the '\u0386'(GREEK CAPITAL LETTER ALPHA WITH TONOS), to two different files, one using the "ISO-8859-7" encoding and a second using "windows-1253" encoding (three texts are included as attachments).
Possible solutions:
Character Mapping Table for "ISO-8859-7" in place 0xA2 should be changed from 90 to 253.
In case of finding a good «positive_ratio» for "ISO-8859-7" encoding, code should check also the "windows-1253" encoding.
The text was updated successfully, but these errors were encountered:
Problem:
The program does not detect "windows-1253" encoding. Any text encoded either using "ISO-8859-7" encoding or "windows-1253" encoding is marked as having the "ISO-8859-7" encoding, thus making any reference to "windows-1253" encoding useless.
The only real differences between "ISO-8859-7" and "windows-1253" lay in Character Mapping Table places:
#A2, #B5, #B6
In Character Mapping Table for "ISO-8859-7" the '\u0386'(GREEK CAPITAL LETTER ALPHA WITH TONOS) lays in place 0xB6 while the same letter in Character Mapping Table for "windows-1253" lays in place 0xA2.
In Character Mapping Table for "ISO-8859-7" in place 0xA2 a "90" is used, indicating that '\u2019' (RIGHT SINGLE QUOTATION MARK), which is used in that place in "ISO-8859-7" encoding, is not a punctuation.
How to repeat:
Save a 'utf8' text, written in Greek and containing at least once the '\u0386'(GREEK CAPITAL LETTER ALPHA WITH TONOS), to two different files, one using the "ISO-8859-7" encoding and a second using "windows-1253" encoding (three texts are included as attachments).
Possible solutions:
The text was updated successfully, but these errors were encountered: