Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid byte sequence error when escaping utf-8 characters in a regexp character range #10902

Closed
runephilosof-karnovgroup opened this issue Aug 10, 2022 · 4 comments · Fixed by #10906
Labels

Comments

@runephilosof-karnovgroup
Copy link

runephilosof-karnovgroup commented Aug 10, 2022

Given file bar.rb containing

SANITIZER_REGEXP = /[\§]/

Running rubocop -d bar.rb will fail with an invalid byte sequence in UTF-8 error.

It is related to escaping the utf-8 character inside a character range, because

SANITIZER_REGEXP = /\§/

and

SANITIZER_REGEXP = /[§]/

works fine


Expected behavior

If it is an error on my part, it should tell me what the error is like any other lint cop.
If not, it should generate an error.

Actual behavior

Describe here what actually happened.

Inspecting 1 file
Scanning /app/bar.rb
An error occurred while Style/RedundantRegexpEscape cop was inspecting /app/bar.rb:1:19.
invalid byte sequence in UTF-8
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/style/redundant_regexp_escape.rb:64:in `match?'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/style/redundant_regexp_escape.rb:64:in `allowed_escape?'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/style/redundant_regexp_escape.rb:47:in `block in on_regexp'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/style/redundant_regexp_escape.rb:83:in `block in each_escape'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/methods/traverse.rb:23:in `block in traverse'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/subexpression.rb:30:in `each'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/subexpression.rb:30:in `each'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/methods/traverse.rb:21:in `each_with_index'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/methods/traverse.rb:21:in `traverse'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/methods/traverse.rb:26:in `block in traverse'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/subexpression.rb:30:in `each'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/subexpression.rb:30:in `each'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/methods/traverse.rb:21:in `each_with_index'
/usr/local/bundle/gems/regexp_parser-2.5.0/lib/regexp_parser/expression/methods/traverse.rb:21:in `traverse'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/style/redundant_regexp_escape.rb:82:in `each'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/style/redundant_regexp_escape.rb:82:in `reduce'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/style/redundant_regexp_escape.rb:82:in `each_escape'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/style/redundant_regexp_escape.rb:46:in `on_regexp'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:100:in `public_send'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:100:in `block (2 levels) in trigger_responding_cops'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:160:in `with_cop_error_handling'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:99:in `block in trigger_responding_cops'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:98:in `each'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:98:in `trigger_responding_cops'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:69:in `on_regexp'
/usr/local/bundle/gems/rubocop-ast-1.21.0/lib/rubocop/ast/traversal.rb:151:in `on_casgn'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:71:in `on_casgn'
/usr/local/bundle/gems/rubocop-ast-1.21.0/lib/rubocop/ast/traversal.rb:20:in `walk'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/commissioner.rb:86:in `investigate'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/team.rb:155:in `investigate_partial'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cop/team.rb:83:in `investigate'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:307:in `inspect_file'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:251:in `block in do_inspection_loop'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:285:in `block in iterate_until_no_changes'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:278:in `loop'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:278:in `iterate_until_no_changes'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:247:in `do_inspection_loop'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:130:in `block in file_offenses'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:155:in `file_offense_cache'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:129:in `file_offenses'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:120:in `process_file'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:101:in `block in each_inspected_file'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:100:in `each'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:100:in `reduce'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:100:in `each_inspected_file'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:86:in `inspect_files'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/runner.rb:47:in `run'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli/command/execute_runner.rb:26:in `block in execute_runner'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli/command/execute_runner.rb:52:in `with_redirect'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli/command/execute_runner.rb:25:in `execute_runner'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli/command/execute_runner.rb:17:in `run'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli/command.rb:11:in `run'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli/environment.rb:18:in `run'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli.rb:71:in `run_command'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli.rb:78:in `execute_runners'
/usr/local/bundle/gems/rubocop-1.30.1/lib/rubocop/cli.rb:47:in `run'
/usr/local/bundle/gems/rubocop-1.30.1/exe/rubocop:12:in `block in <top (required)>'
/usr/local/lib/ruby/3.1.0/benchmark.rb:311:in `realtime'
/usr/local/bundle/gems/rubocop-1.30.1/exe/rubocop:12:in `<top (required)>'
bin/rubocop:29:in `load'
bin/rubocop:29:in `<main>'
C

Offenses:

bar.rb:1:1: C: [Correctable] Style/FrozenStringLiteralComment: Missing frozen string literal comment.
SANITIZER_REGEXP = /[\§]/
^

1 file inspected, 1 offense detected, 1 offense autocorrectable

1 error occurred:
An error occurred while Style/RedundantRegexpEscape cop was inspecting /app/bar.rb:1:19.
Errors are usually caused by RuboCop bugs.
Please, report your problems to RuboCop's issue tracker.
https://github.com/rubocop/rubocop/issues

Mention the following information in the issue report:
1.30.1 (using Parser 3.1.2.1, rubocop-ast 1.21.0, running on ruby 3.1.2 x86_64-linux)

Steps to reproduce the problem

Given file bar.rb containing

SANITIZER_REGEXP = /[\§]/

Running rubocop -d bar.rb will fail with an invalid byte sequence in UTF-8 error.

RuboCop version

Include the output of rubocop -V or bundle exec rubocop -V if using Bundler.
If you see extension cop versions (e.g. rubocop-performance, rubocop-rspec, and others)
output by rubocop -V, include them as well. Here's an example:

1.30.1 (using Parser 3.1.2.1, rubocop-ast 1.21.0, running on ruby 3.1.2 x86_64-linux)
@runephilosof-karnovgroup runephilosof-karnovgroup changed the title invalid byte sequence error when having utf-8 characters in regexp using /u invalid byte sequence error when having utf-8 characters in regexp Aug 10, 2022
@runephilosof-karnovgroup
Copy link
Author

It also fails with

SANITIZER_REGEXP = /[\§]/u

@runephilosof-karnovgroup
Copy link
Author

It does not fail with

SANITIZER_REGEXP1 = /\§/
SANITIZER_REGEXP2 = /\§/u

@runephilosof-karnovgroup
Copy link
Author

It does not fail with

SANITIZER_REGEXP1 = /[§]/
SANITIZER_REGEXP2 = /[§]/u

So the problem is escaping the utf-8 character inside a character range.

@runephilosof-karnovgroup
Copy link
Author

It also occurs in the newest rubocop 1.34.1
If I use rubocop 1.30.1 and downgrade rubocop-ast to 1.19.1 the error does not occur, but only if it has the /u option.
Which I guess is because rubocop does not recognize it as a regular expression, because it does not yet support /u at that version.

@runephilosof-karnovgroup runephilosof-karnovgroup changed the title invalid byte sequence error when having utf-8 characters in regexp invalid byte sequence error when escaping utf-8 characters in a regexp character range Aug 10, 2022
@koic koic added the bug label Aug 11, 2022
ydah added a commit to ydah/rubocop that referenced this issue Jun 12, 2023
…ring with invalid byte sequence in UTF-8

Fixed: rubocop#10902
bbatsov pushed a commit that referenced this issue Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants