Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jruby: use a modern c14n library #2547

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

flavorjones
Copy link
Member

What problem is this PR intended to solve?

See #2410 for a description of an encoding problem in the current JRuby implementation's C14n code. This PR migrates to use a modern, supported version of santuario:xmlsec c14n which addresses some encoding problems.

  • Good: This addresses some encoding problems, allows us to delete code, and gets us on a supported version of the library.
  • Meh: Some new dependencies being pulled in which we'll need to keep up-to-date, and SLF4J warnings are now appearing
  • Bad: This breaks the current Document#canonicalize API which allows an optional block to control what nodes are serialized.

The new dependencies are:

  • require_jar 'commons-codec', 'commons-codec', '1.15'
  • require_jar 'com.fasterxml.woodstox', 'woodstox-core', '6.2.6'
  • require_jar 'org.apache.santuario', 'xmlsec', '2.3.1'
  • require_jar 'org.slf4j', 'slf4j-api', '1.7.36'
  • require_jar 'org.codehaus.woodstox', 'stax2-api', '4.2.1'
  • require_jar 'jakarta.xml.bind', 'jakarta.xml.bind-api', '2.3.3'
  • require_jar 'jakarta.activation', 'jakarta.activation-api', '1.2.2'

The SLF4J warnings emitted are:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

What to do with this branch? I think I want to remove the block parameter from the JRuby API, because:

  1. it didn't previously work the same as CRuby/libxml2 (see commit on this branch that introduces block test coverage that fails)
  2. with the mainline c14n library we can't hack it to support even the broken callback semantics and would be forced to implement something complicated ourselves

Given I'm not confident that C14n is widely-used anyway, removing a half-broken feature like this on JRuby doesn't seem like a controversial decision.

So, questions to the JRuby contributors:

  • Q1: Would it be a reasonable decision to drop the #canonicalize block param on JRuby?
  • Q2: Are we OK with all these new dependencies getting pulled in?
  • Q3: There's some SLF4J warning messages being emitted at runtime that I'm not sure whether or how to quash. Any ideas?

cc @jvshahid @headius @kares @mkristian

Have you included adequate test coverage?

Yarp.

Does this change affect the behavior of either the C or the Java implementations?

The Java implementation is becoming more correct, but a potential breaking change is proposed.

@mkristian
Copy link
Contributor

ad Q2: my take on this is that is totally fine to include mentioned jars the way your wrote it
ad Q3: the warning means that there is no concrete slf4j logger configured which is OK for a library IMO

@flavorjones
Copy link
Member Author

I chatted with @tenderlove yesterday about Q1 and we both think it would be acceptable to remove the block param on JRuby given that the current implementation is not functioning quite right, and use of the c14n API appears to be rare.

@kares
Copy link
Contributor

kares commented May 13, 2022

same as Kristian, looks fine - for a Java library the SLF4J warnings are usually fine.
for a Ruby gem not sure - might confuse users.
adding a noop impl such as the slf4j-nop.jar would avoid the warnings.

@kares
Copy link
Contributor

kares commented May 13, 2022

and as usual great work Mike! 👍 thanks for caring about JRuby 💖

@ghost

This comment was marked as off-topic.

@flavorjones flavorjones added this to the v1.14.0 milestone Jun 5, 2022
@flavorjones flavorjones force-pushed the 2410-jruby-use-a-modern-c14n-library branch 3 times, most recently from 9528978 to 6049c5b Compare June 8, 2022 01:50
expected = if Nokogiri.jruby?
%{<foo>𡏅</foo>}
else
%{<foo>陝</foo>}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I really have no idea why this is different. Maybe @AlexSun1995 has some insight, since he reported this in #2410

@flavorjones
Copy link
Member Author

Rebase off current main

@flavorjones
Copy link
Member Author

Reading this code with fresh eyes, I'm not happy with the breaking change in the canonicalize API and want to revisit that.

@flavorjones flavorjones modified the milestones: v1.14.0, v1.15.0 Aug 26, 2022
@flavorjones flavorjones force-pushed the 2410-jruby-use-a-modern-c14n-library branch from 7231488 to d104fb7 Compare August 27, 2022 18:09
@flavorjones flavorjones force-pushed the 2410-jruby-use-a-modern-c14n-library branch from d104fb7 to 80b9181 Compare October 16, 2022 18:27
instead of using a patched fork of apache c14n from 2013

This commit removes the block parameter from the JRuby API, because

a) it didn't previously work the same as CRuby/libxml2 (see earlier
commit that introduces block test coverage that failed)

b) with the mainline c14n library we can't hack it to support even the
broken callback semantics and would be forced to implement something
complicated ourselves

Given I'm not confident that C14n is widely-used anyway, removing a
misfeature like this doesn't seem like a controversial decision.

So, questions to the JRuby contributors:

Q1: Are we OK with all these new dependencies getting pulled in?
Q2: There's some Log4J warning messages being emitted at runtime that
    I'm not sure whether or how to quash. Any ideas?
@flavorjones flavorjones force-pushed the 2410-jruby-use-a-modern-c14n-library branch from 80b9181 to 654f008 Compare October 16, 2022 19:21
@flavorjones flavorjones modified the milestones: v1.15.0, v1.16.0 Apr 28, 2023
@maths22
Copy link

maths22 commented May 23, 2023

I'm still working on the jruby port for #2888, but I expect it to pull in santuario:xmlsec for jruby xmlsec purposes (which is theoretically independent of changing the canonicalization otherwise used in nokogiri, but is relevant from a adding dependencies perspective)

@flavorjones
Copy link
Member Author

@maths22 I appreciate you making the attempt to get xmlsec to work in JRuby, but I really do think the best first change we can make regarding xmlsec is simply to import the C code and tests (this is what we did with Nokogumbo and libgumbo, for example) and we can tackle introducing JRuby support in a later PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants