Wrong charset detect! #219

NB-Dragon · 2021-04-13T03:15:32Z

Reproduce

>>> chardet.detect_all("中".encode("gb2312"))
[{'encoding': 'ISO-8859-5', 'confidence': 0.99, 'language': 'Russian'}, {'encoding': 'ISO-8859-1', 'confidence': 0.365, 'language': ''}]

Environment

Python 3.7.2
chardet 4.0.0

NB-Dragon · 2021-04-13T03:20:23Z

谷歌浏览器可以正常识别gb2312的编码，不知道为何。

$filename = mb_convert_encoding("中", "gb2312");
header("content-disposition:attachment; filename=$filename");

NB-Dragon · 2021-05-20T02:36:24Z

And now, I have fix it in AdvancedDownloader.
Welcome to learn and turn on the star, thanks.

samamorgan · 2021-11-05T01:14:24Z

I'm seeing a similar, but opposite issue. Chardet 3.0.4 correctly detected a file as "Windows-1252" (a superset of ISO 8859-1), but 4.0.0 detects gb2312, which is clearly incorrect given the file contents I'm reading. Apologies, I can't share here, this is confidential data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong charset detect! #219

Wrong charset detect! #219

NB-Dragon commented Apr 13, 2021

NB-Dragon commented Apr 13, 2021

NB-Dragon commented May 20, 2021

samamorgan commented Nov 5, 2021

Wrong charset detect! #219

Wrong charset detect! #219

Comments

NB-Dragon commented Apr 13, 2021

Reproduce

Environment

NB-Dragon commented Apr 13, 2021

NB-Dragon commented May 20, 2021

samamorgan commented Nov 5, 2021