Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol encoding handling #1970

Open
lopopolo opened this issue Jul 24, 2022 · 0 comments
Open

Symbol encoding handling #1970

lopopolo opened this issue Jul 24, 2022 · 0 comments
Labels
A-ruby-core Area: Ruby Core types. A-spec Area: ruby/spec infrastructure and completeness. C-bug Category: This is a bug. E-hard Call for participation: Experience needed to fix: Hard / a lot.

Comments

@lopopolo
Copy link
Member

String#intern and Symbol#to_s have some nuances around encoding that are not properly modeled by Artichoke:

  • String#intern raises an EncodingError if the string is not valid per its encoding, but maybe only for multibyte encodings:

    [3.1.2] > a = "abc\xFF"
    => "abc\xFF"
    [3.1.2] > a.encoding
    => #<Encoding:UTF-8>
    [3.1.2] > a.valid_encoding?
    => false
    [3.1.2] > a.intern
    (irb):17:in `intern': invalid symbol in encoding UTF-8 :"abc\xFF" (EncodingError)
            from (irb):17:in `<main>'
            from /usr/local/var/rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
            from /usr/local/var/rbenv/versions/3.1.2/bin/irb:25:in `load'
            from /usr/local/var/rbenv/versions/3.1.2/bin/irb:25:in `<main>'
    [3.1.2] > b = "abc\xFF".force_encoding(Encoding::ASCII)
    => "abc\xFF"
    [3.1.2] > b.encoding
    => #<Encoding:US-ASCII>
    [3.1.2] > b.valid_encoding?
    => false
    [3.1.2] > b.intern
    => :"abc\xFF"
    
  • Symbol#to_s is influenced by the encoding of the string that was interned:

    [3.1.2] > b = "abc\xFF".force_encoding(Encoding::ASCII)
    => "abc\xFF"
    [3.1.2] > b.encoding
    => #<Encoding:US-ASCII>
    [3.1.2] > b.valid_encoding?
    => false
    [3.1.2] > t = b.intern
    => :"abc\xFF"
    [3.1.2] > t.to_s.encoding
    => #<Encoding:US-ASCII>
    [3.1.2] > t.to_s.valid_encoding?
    => false
    [3.1.2] > u = c.intern
    => :"abc 😄"
    [3.1.2] > u.to_s.encoding
    => #<Encoding:UTF-8>
    [3.1.2] > u.to_s.valid_encoding?
    => true
    [3.1.2] > d = "cvb\xFF".b
    => "cvb\xFF"
    [3.1.2] > d.encoding
    => #<Encoding:ASCII-8BIT>
    [3.1.2] > d.valid_encoding?
    => true
    [3.1.2] > v = d.intern
    => :"cvb\xFF"
    [3.1.2] > v.to_s.encoding
    => #<Encoding:ASCII-8BIT>
    [3.1.2] > e = "bnm\xFF".force_encoding(Encoding::ASCII)
    => "bnm\xFF"
    [3.1.2] > e.encoding
    => #<Encoding:US-ASCII>
    [3.1.2] > e.valid_encoding?
    => false
    [3.1.2] > w = e.intern
    => :"bnm\xFF"
    [3.1.2] > w.to_s.encoding
    => #<Encoding:US-ASCII>
    [3.1.2] > w.to_s.valid_encoding?
    => false
    
@lopopolo lopopolo added C-bug Category: This is a bug. A-ruby-core Area: Ruby Core types. A-spec Area: ruby/spec infrastructure and completeness. E-hard Call for participation: Experience needed to fix: Hard / a lot. labels Jul 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ruby-core Area: Ruby Core types. A-spec Area: ruby/spec infrastructure and completeness. C-bug Category: This is a bug. E-hard Call for participation: Experience needed to fix: Hard / a lot.
Development

No branches or pull requests

1 participant