Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection of windows-1250 #87

Open
7artur opened this issue Apr 14, 2016 · 6 comments
Open

Detection of windows-1250 #87

7artur opened this issue Apr 14, 2016 · 6 comments

Comments

@7artur
Copy link

7artur commented Apr 14, 2016

Hi!
Is it possible to do a windows-1250 detection? Current implementation returns "windows-1252" for text encoded in windows-1250. Same question goes for "ISO-8859-2" vs. "ISO-8859-1".

@bartoszgrabski
Copy link

bartoszgrabski commented Jan 12, 2017

@dan-blanchard I see "Our ISO-8859-2 and windows-1250 (Hungarian) probers have been temporarily disabled until we can retrain the models." message in the wiki.

I'd like to offer my help with re-enabling these probers, especially that windows-1250 is not only Hungarian, but in general Central & East European, including Polish, Czech, Slovak, Croatian an so on. And Windows is still very popular in this region, and so are text files windows-1250 enoded.

@7artur
Copy link
Author

7artur commented Jan 12, 2017 via email

@dan-blanchard
Copy link
Member

I haven't quite had the time to finish this up, but I actually have a local branch where I'm working on retraining all our models so that this issue goes away.

@7artur
Copy link
Author

7artur commented Jan 12, 2017 via email

@dan-blanchard
Copy link
Member

#99 will re-enable these with newly trained models.

@7artur
Copy link
Author

7artur commented Apr 12, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants