Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle RSpec description with japanese char in CP932 encoded files #2575

Merged
merged 3 commits into from Nov 27, 2018

Conversation

benoittgt
Copy link
Member

When the user create a test with CP932 encoding and japanese chars RSpec
fail to properly display characters or crash with the error:

Encoding::CompatibilityError: incompatible character encodings: Windows-31J and UTF-8

Fix:

@@ -244,6 +244,10 @@ def formatted_message_and_backtrace(colorizer)
end
end

def encoded_description(description)
encoded_string(description) if description
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer return if description.nil? to make it clear what we're doing with this here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Much better. Thanks Jon

@@ -94,6 +94,21 @@ module RSpec::Core
EOS
end

it 'allows the caller to add encoded description' do
the_presenter = Formatters::ExceptionPresenter.new(exception, example,
:description => "ジ".encode("CP932"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec isn't going to work on 1.8.7, check what we do for encoded string in rspec-support and mimic how thats tested? (this is just a suggestion of where to look!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

I found this if String.method_defined?(:encoding) https://github.com/rspec/rspec-support/blob/master/spec/rspec/support/differ_spec.rb#L81

For the moment I'm not able to install ruby-1.8.7-p374 on my machine. It fails with rvm so I will use the CI. Sorry for the noise.

@benoittgt benoittgt force-pushed the deal_with_encoding_error branch 2 times, most recently from c7e2625 to d0b6ee4 Compare October 22, 2018 07:49
@benoittgt
Copy link
Member Author

I don't understand why it breaks that much on AppVeyor. I squashed the commit.

def encoded_description(description)
return if description.nil?

if String.method_defined?(:encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can if case this entire method rather than performing it at runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's needed in lib/rspec/core/formatters/exception_presenter.rb:84:

        def fully_formatted_lines(failure_number, colorizer)
          lines = [
            encoded_description(description),

No?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I mean is this:

if  String.method_defined?(:encoding)
  def encoded_description(description)
    return if description.nil?
    encoded_string(description)
  end
else
  def encoded_description(description)
    description
  end
end

Theres no need to attempt to encode the string if we can't

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks


expect(the_presenter.fully_formatted(1)).to eq(<<-EOS.gsub(/^ +\|/, ''))
|
| 1) ジ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not this character on windows, which is why appveyor is failing, it comes back as ? instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hitanaka156 are you on windows? It'd be good to know what this is supposed to do, I'd be half happy pending this test on windows with a note explaining. (Correct behaviour on posix only is better than no correct behaviour).

Copy link
Member Author

@benoittgt benoittgt Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be half happy pending this test on windows with a note explaining

I'm ok with this. Thanks for the explanation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is

  1) RSpec::Core::Formatters::ExceptionPresenter#fully_formatted allows the caller to add encoded description
     Failure/Error:
       expect(the_presenter.fully_formatted(1)).to eq(<<-EOS.gsub(/^ +\|/, ''))
       |
       |  1) ジ
       |     Failure/Error: # The failure happened here!#{ encoding_check }
       |
       |       Boom
       |       Bam
       |     # ./spec/rspec/core/formatters/exception_presenter_spec.rb:#{line_num}
       EOS
       expected: "\n  1) \u30B8\n     Failure/Error: # The failure happened here!\n\n       Boom\n       Bam\n     # ./spec/rspec/core/formatters/exception_presenter_spec.rb:21\n"
            got: "\n  1) ?\n     Failure/Error: # The failure happened here!\n\n       Boom\n       Bam\n     # ./spec/rspec/core/formatters/exception_presenter_spec.rb:21\n"
       (compared using ==)
       Diff:
       @@ -1,5 +1,5 @@
        
       -  1) ?
       +  1) ?
             Failure/Error: # The failure happened here!

turns to ジ

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just surprised its the replacement character on windows

When the user create a test with CP932 encoding and japanese chars RSpec
fail to properly display characters or crash with the error:

>  Encoding::CompatibilityError: incompatible character encodings: Windows-31J and UTF-8

Fix:
- rspec#2570
- rspec#2543

For Ruby 1.8.7
We are following the same behavior as for rspec-support
https://github.com/rspec/rspec-support/blob/9940a8656071655b807f772f36101b4d27f1b67d/lib/rspec/support/spec/string_matcher.rb#L8
@benoittgt
Copy link
Member Author

benoittgt commented Nov 16, 2018

First I rebase my branch with rspec/rspec-core master.

I started a VM on my machine with Windows 10, and run rspec ./spec/rspec/core/formatters/exception_presenter_spec.rb I had no errors.

On the appveyor the error is still present (and we have other errors related to symlink also now). Maybe this is related to locale?

On the VM
screenshot 2018-11-16 at 21 33 26

Maybe related: https://help.appveyor.com/discussions/problems/6245-unicode-characters-show-as-in-build-output

I continue to look at it

@benoittgt
Copy link
Member Author

Ok, it looks good with chcp 65001 in appveyor config file. Same behavior as local VM.

Except for this point #2575 (comment) that I am not sure I have properly handled, the PR looks good to me. Waiting for the final review @JonRowe. 🙏🏼

@@ -29,7 +29,7 @@ before_test:
- bundle --version

test_script:
- bundle exec rspec --backtrace
- chcp 65001 && bundle exec rspec --backtrace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Forgive my ignorance..

Copy link
Member Author

@benoittgt benoittgt Nov 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned this yesterday. I changed the commit message to add more info.

chp is for "change code page" (https://ss64.com/nt/chcp.html) and 65001 is the code for UTF-8. By doing this command you enforce UTF-8 locale for the next command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to do that? Seems that we want to test we work in non UTF-8 environments...

Copy link
Member Author

@benoittgt benoittgt Nov 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for me, if you don't set the correct local it will fail for every representation of strings with Japanese characters. I will run the command with Japanese locale https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers to validate the correct behavior and also try to see what is the default local for appveyor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that we want to work with the console mode being japanese, not force it to utf-8

Copy link
Member Author

@benoittgt benoittgt Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @JonRowe. I am thinking every day about this open PR and other things I have to do on RSpec repos. 😥


I ran a build with chcp to display the default encoding of app veyor machine.

chcp
Active code page: 437

437 is for "United States". - source

So I think by doing this we are going in the right direction. 437 encoding is probably not meant to display Unicode char, we have to change it.
Related StackOverFlow answer https://stackoverflow.com/a/3781095/2747638
or https://superuser.com/questions/625866/cmd-how-to-use-non-english-characters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can drop the last commit when you are ok with the existing work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I honestly don't know any better so lets drop that commit and merge this, get a release out and see what affect this makes for our friends with problems

def encoded_description(description)
return if description.nil?

if String.method_defined?(:encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I mean is this:

if  String.method_defined?(:encoding)
  def encoded_description(description)
    return if description.nil?
    encoded_string(description)
  end
else
  def encoded_description(description)
    description
  end
end

Theres no need to attempt to encode the string if we can't

chcp 65001 mean Code Page Number and 65001 is UTF-8.

So by doing this change we modify the code page to UTF-8 for the
commande that follows.

See documentation: https://ss64.com/nt/chcp.html
See related issue: appveyor/ci#2107
@@ -29,7 +29,7 @@ before_test:
- bundle --version

test_script:
- bundle exec rspec --backtrace
- chcp 65001 && bundle exec rspec --backtrace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I honestly don't know any better so lets drop that commit and merge this, get a release out and see what affect this makes for our friends with problems

@benoittgt benoittgt merged commit 6c5628f into rspec:master Nov 27, 2018
@benoittgt benoittgt deleted the deal_with_encoding_error branch November 27, 2018 13:12
benoittgt added a commit that referenced this pull request Nov 27, 2018
JonRowe pushed a commit that referenced this pull request Mar 4, 2019
MatheusRich pushed a commit to MatheusRich/rspec-core that referenced this pull request Oct 30, 2020
Handle RSpec description with japanese char in CP932 encoded files
MatheusRich pushed a commit to MatheusRich/rspec-core that referenced this pull request Oct 30, 2020
yujinakayama pushed a commit to yujinakayama/rspec-monorepo that referenced this pull request Oct 6, 2021
yujinakayama pushed a commit to yujinakayama/rspec-monorepo that referenced this pull request Oct 6, 2021
…ith_encoding_error

Handle RSpec description with japanese char in CP932 encoded files

---
This commit was imported from rspec/rspec-core@6c5628f.
yujinakayama pushed a commit to yujinakayama/rspec-monorepo that referenced this pull request Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants