New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1 sentence utf-8 detected as Windows-1252 #185
Comments
You can try https://github.com/Ousret/charset_normalizer :) |
I am working with CSV files currently and personally tried 2 detection libraries. Both failed to detect contents are windows...: cp1257 In some StackOverflow answers I found:
which outputs: So I wrote basic extractor myself, with a fallback to lib solution: I'm kinda new, sorry for indentation stuff, code block ignores my spaces... |
I also haven't been able to successfully use |
This is better maintained and more reliable detection. This avoids issues with chardet mistakenly reporting utf-8 content as windows-1252, see chardet/chardet#185
This is better maintained and more reliable detection. This avoids issues with chardet mistakenly reporting utf-8 content as windows-1252, see chardet/chardet#185
This is better maintained and more reliable detection. This avoids issues with chardet mistakenly reporting utf-8 content as windows-1252, see chardet/chardet#185
Thanks for coming up with this utility, it's a great need but this fails even with simplest examples I can't see how I'd trust this on production.
am I doing something wrong? it's just couple of characters and it's clearly utf-8.
The text was updated successfully, but these errors were encountered: