Skip to content

Commit

Permalink
Merge pull request #2403 from sparklemotion/2376-html5-namespaces-in-…
Browse files Browse the repository at this point in the history
…css-queries

HTML5 documents should not require namespaces in CSS selector queries
  • Loading branch information
flavorjones committed Jan 4, 2022
2 parents de64268 + d1a710e commit dcccf72
Show file tree
Hide file tree
Showing 33 changed files with 1,005 additions and 389 deletions.
7 changes: 6 additions & 1 deletion CHANGELOG.md
Expand Up @@ -35,6 +35,7 @@ This release ends support for:
### Improved

* `{XML,HTML4}::DocumentFragment` constructors all now take an optional parse options parameter or block (similar to Document constructors). [[#1692](https://github.com/sparklemotion/nokogiri/issues/1692)] (Thanks, [@JackMc](https://github.com/JackMc)!)
* `Nokogiri::CSS.xpath_for` allows an `XPathVisitor` to be injected, for finer-grained control over how CSS queries are translated into XPath.
* [CRuby] `XML::Reader#encoding` will return the encoding detected by the parser when it's not passed to the constructor. [[#980](https://github.com/sparklemotion/nokogiri/issues/980)]
* [CRuby] Handle abruptly-closed HTML comments as recommended by WHATWG. (Thanks to [tehryanx](https://hackerone.com/tehryanx?type=user) for reporting!)
* [CRuby] `Node#line` is no longer capped at 65535. libxml v2.9.0 and later support a new parse option, exposed as `Nokogiri::XML::ParseOptions::PARSE_BIG_LINES`, which is turned on by default in `ParseOptions::DEFAULT_{XML,XSLT,HTML,SCHEMA}` (Note that JRuby already supported large line numbers.) [[#1764](https://github.com/sparklemotion/nokogiri/issues/1764), [#1493](https://github.com/sparklemotion/nokogiri/issues/1493), [#1617](https://github.com/sparklemotion/nokogiri/issues/1617), [#1505](https://github.com/sparklemotion/nokogiri/issues/1505), [#1003](https://github.com/sparklemotion/nokogiri/issues/1003), [#533](https://github.com/sparklemotion/nokogiri/issues/533)]
Expand All @@ -45,7 +46,9 @@ This release ends support for:

### Fixed

* XML::Builder blocks restore context properly when exceptions are raised. [[#2372](https://github.com/sparklemotion/nokogiri/issues/2372)] (Thanks, [@ric2b](https://github.com/ric2b) and [@rinthedev](https://github.com/rinthedev)!)
* CSS queries on HTML5 documents now correctly match foreign elements (SVG, MathML) when namespaces are not specified in the query. [[#2376](https://github.com/sparklemotion/nokogiri/issues/2376)]
* `XML::Builder` blocks restore context properly when exceptions are raised. [[#2372](https://github.com/sparklemotion/nokogiri/issues/2372)] (Thanks, [@ric2b](https://github.com/ric2b) and [@rinthedev](https://github.com/rinthedev)!)
* The `Nokogiri::CSS::Parser` cache now uses the `XPathVisitor` configuration as part of the cache key, preventing incorrect cache results from being returned when multiple `XPathVisitor` options are being used.
* Error recovery from in-context parsing (e.g., `Node#parse`) now always uses the correct `DocumentFragment` class. Previously `Nokogiri::HTML4::DocumentFragment` was always used, even for XML documents. [[#1158](https://github.com/sparklemotion/nokogiri/issues/1158)]
* `DocumentFragment#>` now works properly, matching a CSS selector against only the fragment roots. [[#1857](https://github.com/sparklemotion/nokogiri/issues/1857)]
* `XML::DocumentFragment#errors` now correctly contains any parsing errors encountered. Previously this was always empty. (Note that `HTML::DocumentFragment#errors` already did this.)
Expand All @@ -61,6 +64,8 @@ This release ends support for:
### Deprecated

* Passing a `Nokogiri::XML::Node` as the second parameter to `Node.new` is deprecated and will generate a warning. This will become an error in a future version of Nokogiri. [[#975](https://github.com/sparklemotion/nokogiri/issues/975)]
* `Nokogiri::CSS::Parser`, `Nokogiri::CSS::Tokenizer`, and `Nokogiri::CSS::Node` are now internal-only APIs that are no longer documented, and should not be considered stable. With the introduction of `XPathVisitor` injection into `Nokogiri::CSS.xpath_for` there should be no reason to rely on these internal APIs.
* CSS-to-XPath utility classes `Nokogiri::CSS::XPathVisitorAlwaysUseBuiltins` and `XPathVisitorOptimallyUseBuiltins` are deprecated. Prefer `Nokogiri::CSS::XPathVisitor` with appropriate constructor arguments. These classes will be removed in a future version of Nokogiri.


## 1.12.5 / 2021-09-27
Expand Down
4 changes: 2 additions & 2 deletions ext/nokogiri/xml_dtd.c
Expand Up @@ -57,9 +57,9 @@ entities(VALUE self)

/*
* call-seq:
* notations
* notations() → Hash<name(String)⇒Notation>
*
* Get a hash of the notations for this DTD.
* [Returns] All the notations for this DTD in a Hash of Notation +name+ to Notation.
*/
static VALUE
notations(VALUE self)
Expand Down
22 changes: 22 additions & 0 deletions ext/nokogiri/xml_xpath_context.c
Expand Up @@ -86,6 +86,26 @@ xpath_builtin_css_class(xmlXPathParserContextPtr ctxt, int nargs)
xmlXPathFreeObject(needle);
}


/* xmlXPathFunction to select nodes whose local name matches, for HTML5 CSS queries that should ignore namespaces */
static void
xpath_builtin_local_name_is(xmlXPathParserContextPtr ctxt, int nargs)
{
xmlXPathObjectPtr element_name;

assert(ctxt->context->node);

CHECK_ARITY(1);
CAST_TO_STRING;
CHECK_TYPE(XPATH_STRING);
element_name = valuePop(ctxt);

valuePush(ctxt, xmlXPathNewBoolean(xmlStrEqual(ctxt->context->node->name, element_name->stringval)));

xmlXPathFreeObject(element_name);
}


/*
* call-seq:
* register_ns(prefix, uri)
Expand Down Expand Up @@ -361,6 +381,8 @@ new (VALUE klass, VALUE nodeobj)
xmlXPathRegisterNs(ctx, NOKOGIRI_BUILTIN_PREFIX, NOKOGIRI_BUILTIN_URI);
xmlXPathRegisterFuncNS(ctx, (const xmlChar *)"css-class", NOKOGIRI_BUILTIN_URI,
xpath_builtin_css_class);
xmlXPathRegisterFuncNS(ctx, (const xmlChar *)"local-name-is", NOKOGIRI_BUILTIN_URI,
xpath_builtin_local_name_is);

self = Data_Wrap_Struct(klass, 0, deallocate, ctx);
return self;
Expand Down
43 changes: 37 additions & 6 deletions lib/nokogiri/css.rb
@@ -1,18 +1,49 @@
# coding: utf-8
# frozen_string_literal: true

module Nokogiri
# Translate a CSS selector into an XPath 1.0 query
module CSS
class << self
###
# Parse this CSS selector in +selector+. Returns an AST.
def parse(selector)
# TODO: Deprecate this method ahead of 2.0 and delete it in 2.0.
# It is not used by Nokogiri and shouldn't be part of the public API.
def parse(selector) # :nodoc:
Parser.new.parse(selector)
end

###
# Get the XPath for +selector+.
# :call-seq:
# xpath_for(selector) → String
# xpath_for(selector [, prefix:] [, visitor:] [, ns:]) → String
#
# Translate a CSS selector to the equivalent XPath query.
#
# [Parameters]
# - +selector+ (String) The CSS selector to be translated into XPath
#
# - +prefix:+ (String)
#
# The XPath prefix for the query, see Nokogiri::XML::XPath for some options. Default is
# +XML::XPath::GLOBAL_SEARCH_PREFIX+.
#
# - +visitor:+ (Nokogiri::CSS::XPathVisitor)
#
# The visitor class to use to transform the AST into XPath. Default is
# +Nokogiri::CSS::XPathVisitor.new+.
#
# - +ns:+ (Hash<String ⇒ String>)
#
# The namespaces that are referenced in the query, if any. This is a hash where the keys are
# the namespace prefix and the values are the namespace URIs. Default is an empty Hash.
#
# [Returns] (String) The equivalent XPath query for +selector+
#
# 💡 Note that translated queries are cached for performance concerns.
#
def xpath_for(selector, options = {})
Parser.new(options[:ns] || {}).xpath_for(selector, options)
prefix = options.fetch(:prefix, Nokogiri::XML::XPath::GLOBAL_SEARCH_PREFIX)
visitor = options.fetch(:visitor) { Nokogiri::CSS::XPathVisitor.new }
ns = options.fetch(:ns, {})
Parser.new(ns).xpath_for(selector, prefix, visitor)
end
end
end
Expand Down
4 changes: 2 additions & 2 deletions lib/nokogiri/css/node.rb
Expand Up @@ -2,7 +2,7 @@

module Nokogiri
module CSS
class Node
class Node # :nodoc:
ALLOW_COMBINATOR_ON_SELF = [:DIRECT_ADJACENT_SELECTOR, :FOLLOWING_SELECTOR, :CHILD_SELECTOR]

# Get the type of this node
Expand All @@ -23,7 +23,7 @@ def accept(visitor)

###
# Convert this CSS node to xpath with +prefix+ using +visitor+
def to_xpath(prefix = "//", visitor = XPathVisitor.new)
def to_xpath(prefix, visitor)
prefix = "." if ALLOW_COMBINATOR_ON_SELF.include?(type) && value.first.nil?
prefix + visitor.accept(self)
end
Expand Down
16 changes: 12 additions & 4 deletions lib/nokogiri/css/parser.rb
@@ -1,7 +1,7 @@
# frozen_string_literal: true
#
# DO NOT MODIFY!!!!
# This file is automatically generated by Racc 1.5.2
# This file is automatically generated by Racc 1.6.0
# from Racc grammar file "".
#

Expand All @@ -10,6 +10,14 @@

require_relative "parser_extras"

module Nokogiri
module CSS
# :nodoc: all
class Parser < Racc::Parser
end
end
end

module Nokogiri
module CSS
class Parser < Racc::Parser
Expand Down Expand Up @@ -247,7 +255,7 @@ def unescape_css_string(str)
"." => 27,
"*" => 28,
"|" => 29,
":" => 30, }
":" => 30 }

racc_nt_base = 31

Expand Down Expand Up @@ -485,7 +493,7 @@ def _reduce_27(val, _values, result)
end

def _reduce_28(val, _values, result)
result = Node.new(:ELEMENT_NAME,
result = Node.new(:ATTRIB_NAME,
[[val.first, val.last].compact.join(':')]
)

Expand All @@ -495,7 +503,7 @@ def _reduce_28(val, _values, result)
def _reduce_29(val, _values, result)
# Default namespace is not applied to attributes.
# So we don't add prefix "xmlns:" as in namespaced_ident.
result = Node.new(:ELEMENT_NAME, [val.first])
result = Node.new(:ATTRIB_NAME, [val.first])

result
end
Expand Down
12 changes: 10 additions & 2 deletions lib/nokogiri/css/parser.y
Expand Up @@ -96,14 +96,14 @@ rule
;
attrib_name
: namespace '|' IDENT {
result = Node.new(:ELEMENT_NAME,
result = Node.new(:ATTRIB_NAME,
[[val.first, val.last].compact.join(':')]
)
}
| IDENT {
# Default namespace is not applied to attributes.
# So we don't add prefix "xmlns:" as in namespaced_ident.
result = Node.new(:ELEMENT_NAME, [val.first])
result = Node.new(:ATTRIB_NAME, [val.first])
}
;
function
Expand Down Expand Up @@ -255,6 +255,14 @@ end

require_relative "parser_extras"

module Nokogiri
module CSS
# :nodoc: all
class Parser < Racc::Parser
end
end
end

---- inner

def unescape_css_identifier(identifier)
Expand Down
25 changes: 12 additions & 13 deletions lib/nokogiri/css/parser_extras.rb
Expand Up @@ -4,7 +4,7 @@

module Nokogiri
module CSS
class Parser < Racc::Parser
class Parser < Racc::Parser # :nodoc:
CACHE_SWITCH_NAME = :nokogiri_css_parser_cache_is_off

@cache = {}
Expand All @@ -23,7 +23,7 @@ def set_cache(value) # rubocop:disable Naming/AccessorMethodName

# Get the css selector in +string+ from the cache
def [](string)
return unless cache_on?
return nil unless cache_on?
@mutex.synchronize { @cache[string] }
end

Expand Down Expand Up @@ -71,17 +71,10 @@ def next_token
end

# Get the xpath for +string+ using +options+
def xpath_for(string, options = {})
key = "#{string}#{options[:ns]}#{options[:prefix]}"
v = self.class[key]
return v if v

args = [
options[:prefix] || "//",
options[:visitor] || XPathVisitor.new,
]
self.class[key] = parse(string).map do |ast|
ast.to_xpath(*args)
def xpath_for(string, prefix, visitor)
key = cache_key(string, prefix, visitor)
self.class[key] ||= parse(string).map do |ast|
ast.to_xpath(prefix, visitor)
end
end

Expand All @@ -90,6 +83,12 @@ def on_error(error_token_id, error_value, value_stack)
after = value_stack.compact.last
raise SyntaxError, "unexpected '#{error_value}' after '#{after}'"
end

def cache_key(query, prefix, visitor)
if self.class.cache_on?
[query, prefix, @namespaces, visitor.config]
end
end
end
end
end
3 changes: 2 additions & 1 deletion lib/nokogiri/css/tokenizer.rb
Expand Up @@ -7,7 +7,8 @@

module Nokogiri
module CSS
class Tokenizer # :nodoc:
# :nodoc: all
class Tokenizer
require 'strscan'

class ScanError < StandardError ; end
Expand Down
3 changes: 2 additions & 1 deletion lib/nokogiri/css/tokenizer.rex
@@ -1,6 +1,7 @@
module Nokogiri
module CSS
class Tokenizer # :nodoc:
# :nodoc: all
class Tokenizer

macro
nl \n|\r\n|\r|\f
Expand Down

0 comments on commit dcccf72

Please sign in to comment.