Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows-1253 are not detected #197

Open
Xoristzatziki opened this issue Jun 29, 2020 · 0 comments
Open

windows-1253 are not detected #197

Xoristzatziki opened this issue Jun 29, 2020 · 0 comments

Comments

@Xoristzatziki
Copy link

Problem:
The program does not detect "windows-1253" encoding. Any text encoded either using "ISO-8859-7" encoding or "windows-1253" encoding is marked as having the "ISO-8859-7" encoding, thus making any reference to "windows-1253" encoding useless.

The only real differences between "ISO-8859-7" and "windows-1253" lay in Character Mapping Table places:
#A2, #B5, #B6
In Character Mapping Table for "ISO-8859-7" the '\u0386'(GREEK CAPITAL LETTER ALPHA WITH TONOS) lays in place 0xB6 while the same letter in Character Mapping Table for "windows-1253" lays in place 0xA2.
In Character Mapping Table for "ISO-8859-7" in place 0xA2 a "90" is used, indicating that '\u2019' (RIGHT SINGLE QUOTATION MARK), which is used in that place in "ISO-8859-7" encoding, is not a punctuation.

How to repeat:
Save a 'utf8' text, written in Greek and containing at least once the '\u0386'(GREEK CAPITAL LETTER ALPHA WITH TONOS), to two different files, one using the "ISO-8859-7" encoding and a second using "windows-1253" encoding (three texts are included as attachments).

Possible solutions:

  1. Character Mapping Table for "ISO-8859-7" in place 0xA2 should be changed from 90 to 253.
  2. In case of finding a good «positive_ratio» for "ISO-8859-7" encoding, code should check also the "windows-1253" encoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant