New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue reproduction for ISO-8859-1 parsing on JRuby #2080
Conversation
Code Climate has analyzed commit 386ebcb and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 100.0% (80% is the threshold). This pull request will bring the total coverage in the repository to 94.3%. View more on Code Climate. |
✅ Build nokogiri 1.0.672 completed (commit 43396896ab by @thbar) |
The build is catching the failure on JRuby indeed:
|
Thank you for submitting this! Fantastic executable bug report. I appreciate it. |
@flavorjones you welcome :-) I may be able to get back with more information and a git bisect tomorrow, if all goes well. |
OK - I ran a git bisect and ba16682 introduced this regression. Tagging @jvshahid for his awareness 💕 nokogiri $ git bisect good
ba16682aae2cbd42c196bd8afdfcfe8a5d82fbdb is the first bad commit
commit ba16682aae2cbd42c196bd8afdfcfe8a5d82fbdb
Author: John Shahid <jvshahid@gmail.com>
Date: Thu Apr 18 13:58:18 2019 -0400
Split setInputSource into setIOInputSource and setStringInputSource
This commit also consolidates some of the encoding handling logic that was
repeated in multiple places.
ext/java/nokogiri/HtmlDocument.java | 25 +++---
ext/java/nokogiri/HtmlSaxParserContext.java | 100 ++++++-----------------
ext/java/nokogiri/XmlDocument.java | 33 +++-----
ext/java/nokogiri/XmlSaxParserContext.java | 14 ++--
ext/java/nokogiri/internals/NokogiriHelpers.java | 12 +++
ext/java/nokogiri/internals/ParserContext.java | 71 ++++++----------
6 files changed, 94 insertions(+), 161 deletions(-)
|
One point of concern is that the document appears to be truncated, without any error. I wonder why this is happening (instead of a hard error in fail-fast mode!). If I find anything more, I will report back, but so far I'm just discovering the Java part. |
This change fixes the tests in #2080, but introduces more errors. The errors are mostly unexpected null encoding when parsing an HTML document.
@jvshahid working on it! FWIW, just adding this to my app doesn't work at this point: gem 'nokogiri', git: 'https://github.com/sparklemotion/nokogiri.git', branch: 'repro-iso-jruby' I presume some form of compilation must be done locally first, I will try that out. This would mean (side point) that the advice at https://bundler.io/guides/git.html is out of date! I'll report back. |
@jvshahid my app test suite pass ✅. Congrats for your fix! For the record, here is roughly what I had to do:
Then use: gem 'nokogiri', path: '../nokogiri' and |
#2083 is green, I'll merge this and that as soon as I get a chance. Will be in the next v1.11.0 release candidate, hopefully within the next few days. |
This change fixes the tests in #2080, but introduces more errors. The errors are mostly unexpected null encoding when parsing an HTML document.
As commented in this discussion, I've experienced regressions while parsing ISO-8859-1 documents under JRuby, with the recent release candidates (
1.11.0.rc1
,1.11.0.rc2
,1.11.0.rc3
), something which did not happen with1.10.10
.I'm creating this first PR as an attempt to show the reproduction to Nokogiri maintainers (hopefully the build will fail, but we'll see!).
I will try to come back with a
git bisect
test later to isolate the commit which introduced this change.