Wrong detection UTF-8 with ö symbol #288

sergei-sss · 2024-03-29T11:01:47Z

Hi! I'm not an expert in encoding. Can someone please advise me on what I'm doing wrong? The example seems quite simple, but the result is incorrect. Perhaps, it's assumed to use a library for larger texts?

b = b'Sch\xc3\xb6ne gesunde Pflanzen'
chardet.detect(b)  # {'encoding': 'ISO-8859-9', 'confidence': 0.6294978352301421, 'language': 'Turkish'}
b.decode(chardet.detect(b)['encoding'])  # Result: 'SchÃ¶ne gesunde Pflanzen'
b.decode("utf-8")  # Result: 'Schöne gesunde Pflanzen'

5.2.0/3.11.7

The text was updated successfully, but these errors were encountered:

sergei-sss changed the title ~~Wrong detection UTF-8~~ Wrong detection UTF-8 with ö symbol Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong detection UTF-8 with ö symbol #288

Wrong detection UTF-8 with ö symbol #288

sergei-sss commented Mar 29, 2024 •

edited

Wrong detection UTF-8 with ö symbol #288

Wrong detection UTF-8 with ö symbol #288

Comments

sergei-sss commented Mar 29, 2024 • edited

sergei-sss commented Mar 29, 2024 •

edited