Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blank_as throws "invalid byte sequence in UTF-8" for string with UTF-16 LE encoding #27

Open
PikachuEXE opened this issue Dec 3, 2019 · 5 comments

Comments

@PikachuEXE
Copy link

File.read("/Users/pikachuexe/Downloads/test_data_feed.txt").blank_as?
# => ArgumentError: invalid byte sequence in UTF-8
require "rchardet"
::CharDet.detect(File.read("/Users/pikachuexe/Downloads/test_data_feed.txt"))
# Result
{
    "encoding" => "UTF-16LE",
  "confidence" => 1.0
}

test_data_feed.txt

@SamSaffron
Copy link
Owner

Is standard Ruby Regex able to handle this? Does stuff like length work?

@PikachuEXE
Copy link
Author

Methods like #empty?, #length, #size works as expected
But regex matching would cause same kind of error
image

@PikachuEXE
Copy link
Author

However if remove the BOM then both blank_as? and regex work
image

@SamSaffron
Copy link
Owner

SamSaffron commented Dec 4, 2019 via email

@PikachuEXE
Copy link
Author

Here the bug report
https://bugs.ruby-lang.org/issues/16402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants