Skip to content

Commit

Permalink
feat!: XSLT docs are parsed with additional ParseOptions
Browse files Browse the repository at this point in the history
Closes #1940
  • Loading branch information
flavorjones committed Apr 20, 2021
1 parent 2934ea6 commit a5a7109
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 2 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Expand Up @@ -4,6 +4,18 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA

---

## next / unreleased

### Changed

* Introduce `Nokogiri::XML::ParseOptions::DEFAULT_XSLT` which adds the libxslt-preferred options of `NOENT | DTDLOAD | DTDATTR | NOCDATA` to `ParseOptions::DEFAULT_XML`.


### Fixed

* `Nokogiri.XSLT` parses the stylesheet using `ParseOptions::DEFAULT_XSLT`, which should make some edge-case XSL transformations match libxslt's default behavior. [[#1940](https://github.com/sparklemotion/nokogiri/issues/1940)]


## 1.11.3 / 2021-04-07

### Fixed
Expand Down
2 changes: 2 additions & 0 deletions lib/nokogiri/xml/parse_options.rb
Expand Up @@ -71,6 +71,8 @@ class ParseOptions

# the default options used for parsing XML documents
DEFAULT_XML = RECOVER | NONET
# the default options used for parsing XSLT stylesheets
DEFAULT_XSLT = RECOVER | NONET | NOENT | DTDLOAD | DTDATTR | NOCDATA
# the default options used for parsing HTML documents
DEFAULT_HTML = RECOVER | NOERROR | NOWARNING | NONET
# the default options used for parsing XML schemas
Expand Down
5 changes: 3 additions & 2 deletions lib/nokogiri/xslt.rb
Expand Up @@ -27,10 +27,11 @@ def parse(string, modules = {})
XSLT.register(url, klass)
end

doc = XML::Document.parse(string, nil, nil, XML::ParseOptions::DEFAULT_XSLT)
if Nokogiri.jruby?
Stylesheet.parse_stylesheet_doc(XML.parse(string), string)
Stylesheet.parse_stylesheet_doc(doc, string)
else
Stylesheet.parse_stylesheet_doc(XML.parse(string))
Stylesheet.parse_stylesheet_doc(doc)
end
end

Expand Down
33 changes: 33 additions & 0 deletions test/test_xslt_transforms.rb
Expand Up @@ -367,5 +367,38 @@ def test_non_html_xslt_transform
end
assert_match(/decimal/, exception.message)
end

describe "DEFAULT_XSLT parse options" do
it "is the union of DEFAULT_XML and libxslt's XSLT_PARSE_OPTIONS" do
xslt_parse_options = Nokogiri::XML::ParseOptions.new.noent.dtdload.dtdattr.nocdata
expected = Nokogiri::XML::ParseOptions::DEFAULT_XML | xslt_parse_options.options
assert_equal(expected, Nokogiri::XML::ParseOptions::DEFAULT_XSLT)
end

it "parses docs the same as xsltproc" do
skip_unless_libxml2("JRuby implementation disallows this edge case XSLT")

# see https://github.com/sparklemotion/nokogiri/issues/1940
xml = "<t></t>"
xsl = <<~EOF
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" omit-xml-declaration="no" />
<xsl:template match="/">
<xsl:text disable-output-escaping="yes"><![CDATA[<>]]></xsl:text>
</xsl:template>
</xsl:stylesheet>
EOF

doc = Nokogiri::XML(xml)
stylesheet = Nokogiri::XSLT(xsl)

# TODO: ideally I'd like to be able to access the parse options in the final object
# assert_equal(Nokogiri::XML::ParseOptions::DEFAULT_XSLT, stylesheet.document.parse_options)

result = stylesheet.transform(doc)
assert_equal("<>", result.children.to_xml)
end
end
end
end

0 comments on commit a5a7109

Please sign in to comment.