Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better defensive behavior when libxml2 or libxslt will make unsafe modifications to a document #2829

Merged
merged 4 commits into from Mar 10, 2023

Conversation

flavorjones
Copy link
Member

@flavorjones flavorjones commented Mar 9, 2023

What problem is this PR intended to solve?

Closes #2800

#2800 reported behavior in libxslt (upstream https://gitlab.gnome.org/GNOME/libxslt/-/issues/14) wherein using xsl:strip-spaces may remove blank text nodes from the original document. If a Ruby object is wrapped around this node, then that's a problem that may lead to a segfault.

Fortunately, we can tell if strip-spaces is going to be applied. We can also (thanks in part to prior work in #2001 to fix #1985) detect when the document has Ruby objects for blank text nodes. If both of these things are true, we make a "defensive" copy of the original document, and pass that copy into xsltApplyStylesheet. The original then remains unmodified.

This might be surprising behavior, and in fact in #2001 @tenderlove opted for an explicit exception to be raised in these conditions. Here I'm opting to just "do what the user means", and if someone is relying on the original being modified and reports a bug we can talk about it then.

Note that I've also backported this approach to the fix from #2001, so in that circumstance we choose to make a defensive copy of the original and pass it into xmlSchemaNewDocParserCtxt so that the original remains unmodified.

Have you included adequate test coverage?

Yes!

Does this change affect the behavior of either the C or the Java implementations?

This kind of edge-case handling is unnecessary in JRuby because this isn't a problem with the Java libraries.

@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Mar 9, 2023
@flavorjones
Copy link
Member Author

Memcheck caught that I need to free the doc returned from xmlCopyDoc. Will patch.

when there are blank text node objects. previously we raised an
exception.
@flavorjones flavorjones force-pushed the flavorjones-2800-xslt-modifying-doc branch from 53120ec to 7e6045d Compare March 9, 2023 20:49
@flavorjones
Copy link
Member Author

which it does be default when xsl:strip-space is used

this approach makes a defensive copy of the doc if there's a chance
the original may be modified in an unsafe way:

- if any spaces will be stripped
- and there are blank node objects that might be removed

Fixes #2800
@flavorjones flavorjones force-pushed the flavorjones-2800-xslt-modifying-doc branch from 7e6045d to cfbb42f Compare March 10, 2023 03:27
@flavorjones
Copy link
Member Author

flavorjones commented Mar 10, 2023

Another segfault, again with Darwin and Ruby 3.2: https://github.com/sparklemotion/nokogiri/actions/runs/4381034712/jobs/7668806803#step:5:911

@flavorjones
Copy link
Member Author

flavorjones commented Mar 10, 2023

Yeah, the test suite is crashing pretty reliably for me on my Mac

Whoops, that was due to user error. Still trying to reproduce a crash on darwin.

@flavorjones
Copy link
Member Author

The segfaults seem to be caused by a bug that was already fixed by https://bugs.ruby-lang.org/issues/19469 (present in Ruby 3.2.0 and 3.2.1).

@flavorjones flavorjones merged commit 2932489 into main Mar 10, 2023
120 checks passed
@flavorjones flavorjones deleted the flavorjones-2800-xslt-modifying-doc branch March 10, 2023 14:55
@flavorjones flavorjones added this to the v1.15.0 milestone Apr 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XSLT segfault Memory access violation / Crash during GC
1 participant