You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Over the years, NodeSet has slowly approached being API-compatible with Enumerable or Array. This is good, and it validates the mental model of libxml2's xmlNodeSet as an augmented ordered set, especially given that the underlying implementation is literally a C array:
typedefstruct_xmlNodeSetxmlNodeSet;
typedefxmlNodeSet*xmlNodeSetPtr;
struct_xmlNodeSet {
intnodeNr; /* number of nodes in the set */intnodeMax; /* size of the array as allocated */xmlNodePtr*nodeTab; /* array of nodes in no particular order *//* @@ with_ns to check whether namespace nodes should be looked at @@ */
};
However, we find ourselves at an interesting point, where NodeSet is not completely an Enumerable or Array, and there are open issues pointing this out:
Finally, the NodeSet class is bigger and more complex than necessary (in both CRuby and JRuby), and so is a bit of a maintenance burden at this point.
NodeSet Tomorrow
As mentioned in #1952, it would be simpler if NodeSet was a subclass of Array, which would free us from using libxml2's xmlNodeSet and unify the JRuby and CRuby implementations
The memory model could be updated so that it was independent of any Document, thereby bringing it into alignment with the memory model of all the standard Ruby collection classes.
The Enumerable API would be perfectly conformed to.
The API would be extended with Searchable to support current API usage.
The API could also implement Document decorators at creation time by optionally inheriting them from an existing NodeSet or the creating Document. Decorators are a rarely-used and ill-documented feature which I suspect is buggy and would be improved by moving to a simpler implementation.
DocumentFragment tomorrow
Finally, this opens the door to a long-time roadmap item, which is to re-implement DocumentFragment on top of NodeSet, thereby avoiding use of libxml2's underlying conventions (and further unifying the JRuby and CRuby implementations). This would further be a simplifying change and would potentially allow us to fix the quirks with how XPath searches work in fragments differently than in Documents and NodeSets.
The first risk exists because we'd be making an invasive change to the current codebase which has been tested thoroughly by many applications over many years. This can be mitigated by continuing to run valgrind in the CI suite, and potentially extending coverage to use ASan. We may want to consider implementing a new class entirely to allow applications the ability to "flip back to the previous implementation" at runtime if any surprising problems occur (i.e., by setting an environment variable or global constant before Nokogiri is loaded).
The second risk exists because a NodeSet may now contain nodes from many documents, and the highly-connected DOM graph may then mean that many unused objects would be prevented from being GCed. This perhaps shouldn't be surprising to anyone who's thought deeply about directed graphs.
The text was updated successfully, but these errors were encountered:
I spent a little bit of time spiking on this today and got pretty far on getting the test suite to pass. I haven't turned on valgrind checks yet, but I'm optimistic that this might be easier than I suspected.
NodeSet today
Over the years,
NodeSet
has slowly approached being API-compatible withEnumerable
orArray
. This is good, and it validates the mental model of libxml2'sxmlNodeSet
as an augmented ordered set, especially given that the underlying implementation is literally a C array:However, we find ourselves at an interesting point, where
NodeSet
is not completely anEnumerable
orArray
, and there are open issues pointing this out:Further,
NodeSet
has baggage, namely the associatedDocument
object which makes simple operations harder:or even causes bugs:
Finally, the
NodeSet
class is bigger and more complex than necessary (in both CRuby and JRuby), and so is a bit of a maintenance burden at this point.NodeSet Tomorrow
As mentioned in #1952, it would be simpler if
NodeSet
was a subclass ofArray
, which would free us from using libxml2'sxmlNodeSet
and unify the JRuby and CRuby implementationsThe memory model could be updated so that it was independent of any
Document
, thereby bringing it into alignment with the memory model of all the standard Ruby collection classes.The
Enumerable
API would be perfectly conformed to.The API would be extended with
Searchable
to support current API usage.The API could also implement
Document
decorators at creation time by optionally inheriting them from an existingNodeSet
or the creatingDocument
. Decorators are a rarely-used and ill-documented feature which I suspect is buggy and would be improved by moving to a simpler implementation.DocumentFragment tomorrow
Finally, this opens the door to a long-time roadmap item, which is to re-implement
DocumentFragment
on top ofNodeSet
, thereby avoiding use of libxml2's underlying conventions (and further unifying the JRuby and CRuby implementations). This would further be a simplifying change and would potentially allow us to fix the quirks with how XPath searches work in fragments differently than inDocument
s andNodeSet
s.Risks
Primarily, the risks are:
The first risk exists because we'd be making an invasive change to the current codebase which has been tested thoroughly by many applications over many years. This can be mitigated by continuing to run
valgrind
in the CI suite, and potentially extending coverage to useASan
. We may want to consider implementing a new class entirely to allow applications the ability to "flip back to the previous implementation" at runtime if any surprising problems occur (i.e., by setting an environment variable or global constant before Nokogiri is loaded).The second risk exists because a
NodeSet
may now contain nodes from many documents, and the highly-connected DOM graph may then mean that many unused objects would be prevented from being GCed. This perhaps shouldn't be surprising to anyone who's thought deeply about directed graphs.The text was updated successfully, but these errors were encountered: