Skip to content
This repository has been archived by the owner on Aug 26, 2023. It is now read-only.

Commit

Permalink
feat: Nokogumbo detects Nokogiri's HTML5 API
Browse files Browse the repository at this point in the history
Closes #170

A future version of Nokogiri will provide Nokogumbo's API (see
sparklemotion/nokogiri#2204). This change
will allow Nokogumbo to detect whether Nokogiri provides the HTML5 API
and become a "shim" -- gracefully defer to Nokogiri by refusing to
load itself.

Some contractual assumptions I'm making about Nokogiri:

- Nokogiri will faithfully reproduce the `::Nokogiri::HTML5` singleton
  method, module, and namespace (including classes
  `Nokogiri::HTML5::Node`, `Nokogiri::HTML5::Document`, and
  `Nokogiri::HTML5::DocumentFragment`)

- Nokogiri will not provide a `::Nokogumbo` module/namespace, but will
  provide a similar `::Nokogiri::Gumbo` module which will provide the
  same constants and singleton methods as `::Nokogumbo`:

  - `Nokogumbo.parse()` will be provided as `Nokogiri::Gumbo.parse()`
  - `Nokogumbo.fragment()` → `Nokogiri::Gumbo.fragment()`
  - `Nokogumbo::DEFAULT_MAX_ATTRIBUTES` → `Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES`
  - `Nokogumbo::DEFAULT_MAX_ERRORS` → `Nokogiri::Gumbo::DEFAULT_MAX_ERRORS`
  - `Nokogumbo::DEFAULT_MAX_TREE_DEPTH` → `Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH`

This change checks for the existence of `Nokogiri::HTML5`,
`Nokogiri::Gumbo`, and an expected singleton method on each. We could
do a more- or less-thorough check here.

This change also provides an "escape hatch" using an environment
variable `NOKOGUMBO_IGNORE_NOKOGIRI_HTML5` which can be set to avoid
the "shim" behavior. This escape hatch might be unnecessary, but this
change is invasive enough to make me want to be cautious.

In "shim" mode, `Nokogumbo.parse()` and `.fragment()` will be
forwarded to the Nokogiri implementation. The `Nokogumbo::DEFAULT*`
constants will always be defined, but when in "shim" mode will be set
to the `Nokogiri`-provided values.

Nokogumbo will emit a single warning message at `require`-time when it
is in "shim" mode. This message points users to
sparklemotion/nokogiri#2205 which will
explain what's going on and help people migrate their
applications (but is an empty placeholder right now).

I did not include deprecation warning messages in `Nokogumbo.parse`
and `.fragment`. If you feel strongly that we should, let me know.
  • Loading branch information
flavorjones committed Mar 14, 2021
1 parent 7a6c04d commit 457152f
Showing 1 changed file with 31 additions and 10 deletions.
41 changes: 31 additions & 10 deletions lib/nokogumbo.rb
@@ -1,17 +1,38 @@
require 'nokogiri'
require 'nokogumbo/version'
require 'nokogumbo/html5'

require 'nokogumbo/nokogumbo'
if ((defined?(Nokogiri::HTML5) && Nokogiri::HTML5.respond_to?(:parse)) &&
(defined?(Nokogiri::Gumbo) && Nokogiri::Gumbo.respond_to?(:parse)) &&
!(ENV.key?("NOKOGUMBO_IGNORE_NOKOGIRI_HTML5") && ENV["NOKOGUMBO_IGNORE_NOKOGIRI_HTML5"] != "false"))

module Nokogumbo
# The default maximum number of attributes per element.
DEFAULT_MAX_ATTRIBUTES = 400
warn "NOTE: nokogumbo: Using Nokogiri::HTML5 provided by Nokogiri. See https://github.com/sparklemotion/nokogiri/issues/2205 for more information."

# The default maximum number of errors for parsing a document or a fragment.
DEFAULT_MAX_ERRORS = 0
module Nokogumbo
def self.parse(*args)
Nokogiri::Gumbo.parse(*args)
end

# The default maximum depth of the DOM tree produced by parsing a document
# or fragment.
DEFAULT_MAX_TREE_DEPTH = 400
def self.fragment(*args)
Nokogiri::Gumbo.fragment(*args)
end

DEFAULT_MAX_ATTRIBUTES = Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES
DEFAULT_MAX_ERRORS = Nokogiri::Gumbo::DEFAULT_MAX_ERRORS
DEFAULT_MAX_TREE_DEPTH = Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH
end
else
require 'nokogumbo/html5'
require 'nokogumbo/nokogumbo'

module Nokogumbo
# The default maximum number of attributes per element.
DEFAULT_MAX_ATTRIBUTES = 400

# The default maximum number of errors for parsing a document or a fragment.
DEFAULT_MAX_ERRORS = 0

# The default maximum depth of the DOM tree produced by parsing a document
# or fragment.
DEFAULT_MAX_TREE_DEPTH = 400
end
end

0 comments on commit 457152f

Please sign in to comment.