Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chardetect cli: UnicodeEncodeError when filename is not utf8 #278

Open
milahu opened this issue Mar 31, 2023 · 0 comments
Open

chardetect cli: UnicodeEncodeError when filename is not utf8 #278

milahu opened this issue Mar 31, 2023 · 0 comments

Comments

@milahu
Copy link

milahu commented Mar 31, 2023

the chardetect cli fails when the filename is not utf8

echo some$'\277'file.txt | chardetect
# <stdin>: ISO-8859-1 with confidence 0.73

echo some$'\277'content > some$'\277'file.txt

chardetect some$'\277'file.txt
# UnicodeEncodeError: 'utf-8' codec can't encode character '\udcbf' in position 4: surrogates not allowed

expected: same behavior as the ls command

ls some$'\277'file.txt 
# 'some'$'\277''file.txt'

workaround: pipe the input file to stdin

echo some$'\277'content > some$'\277'file.txt

cat some$'\277'file.txt | chardetect
# <stdin>: ISO-8859-1 with confidence 0.73
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant