You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
result is {'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''} , when without the BOM the result is {'encoding': 'GB2312', 'confidence': 0.99, 'language': 'Chinese'}
It seems GB18030 detection is not so reliable even without the BOM. 你好 encoded as GB18030 is detected as TIS-620, which of course will not decode it correctly -- instead TIS-620 decodes it as ฤใบร.
While it isnt common for text to start with a GB18030 BOM (\uFEFF), it results in non-detection and mis-detection.
https://en.wikipedia.org/wiki/Byte_order_mark#Byte_order_marks_by_encoding
result is
{'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}
, when without the BOM the result is{'encoding': 'GB2312', 'confidence': 0.99, 'language': 'Chinese'}
See also http://www.0x08.org/posts/UTF8-BOM
The text was updated successfully, but these errors were encountered: