Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP949 detected, but when decode: illegal multibyte sequence #170

Open
robert-d-schultz opened this issue Dec 19, 2018 · 0 comments
Open

Comments

@robert-d-schultz
Copy link

robert-d-schultz commented Dec 19, 2018

Using chardet 4.0.0 I put the following ISO-8859-1 sequence of bytes through detect_all:

OTE up to \xa350K first year!. to emergency situations \xb7 perform all activities with children, i.e. jump, dance, walk, run, etc. for extended periods of time \xb7 must possess acceptable hearing... . oh. to emergency situations \xb7 perform all activities with children, i.e. jump, dance, walk, run, etc. for extended periods of time \xb7 both indoor and outdoor... . ok. for the public including lectures, concerts, recitals, dramatic productions, dance performances, films, and art exhibits. laurens county's renowned quality of... . sc.

The result was:

[{'encoding': 'CP949', 'confidence': 0.99}, {'encoding': 'ISO-8859-1', 'confidence': 0.73}, {'encoding': 'ISO-8859-9', 'confidence': 0.5047156139708759}]

When trying to decode it using CP949, I got a Unicode error:

UnicodeDecodeError: 'cp949' codec can't decode byte 0xa3 in position 10: illegal multibyte sequence

CP949 shouldn't have been predicted at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant