Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explore: make NodeSet a subclass of Array #2184

Open
flavorjones opened this issue Feb 1, 2021 · 2 comments
Open

explore: make NodeSet a subclass of Array #2184

flavorjones opened this issue Feb 1, 2021 · 2 comments
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Milestone

Comments

@flavorjones
Copy link
Member

NodeSet today

Over the years, NodeSet has slowly approached being API-compatible with Enumerable or Array. This is good, and it validates the mental model of libxml2's xmlNodeSet as an augmented ordered set, especially given that the underlying implementation is literally a C array:

typedef struct _xmlNodeSet xmlNodeSet;
typedef xmlNodeSet *xmlNodeSetPtr;
struct _xmlNodeSet {
    int nodeNr;			/* number of nodes in the set */
    int nodeMax;		/* size of the array as allocated */
    xmlNodePtr *nodeTab;	/* array of nodes in no particular order */
    /* @@ with_ns to check whether namespace nodes should be looked at @@ */
};

However, we find ourselves at an interesting point, where NodeSet is not completely an Enumerable or Array, and there are open issues pointing this out:

Further, NodeSet has baggage, namely the associated Document object which makes simple operations harder:

or even causes bugs:

Finally, the NodeSet class is bigger and more complex than necessary (in both CRuby and JRuby), and so is a bit of a maintenance burden at this point.

NodeSet Tomorrow

As mentioned in #1952, it would be simpler if NodeSet was a subclass of Array, which would free us from using libxml2's xmlNodeSet and unify the JRuby and CRuby implementations

The memory model could be updated so that it was independent of any Document, thereby bringing it into alignment with the memory model of all the standard Ruby collection classes.

The Enumerable API would be perfectly conformed to.

The API would be extended with Searchable to support current API usage.

The API could also implement Document decorators at creation time by optionally inheriting them from an existing NodeSet or the creating Document. Decorators are a rarely-used and ill-documented feature which I suspect is buggy and would be improved by moving to a simpler implementation.

DocumentFragment tomorrow

Finally, this opens the door to a long-time roadmap item, which is to re-implement DocumentFragment on top of NodeSet, thereby avoiding use of libxml2's underlying conventions (and further unifying the JRuby and CRuby implementations). This would further be a simplifying change and would potentially allow us to fix the quirks with how XPath searches work in fragments differently than in Documents and NodeSets.

Risks

Primarily, the risks are:

The first risk exists because we'd be making an invasive change to the current codebase which has been tested thoroughly by many applications over many years. This can be mitigated by continuing to run valgrind in the CI suite, and potentially extending coverage to use ASan. We may want to consider implementing a new class entirely to allow applications the ability to "flip back to the previous implementation" at runtime if any surprising problems occur (i.e., by setting an environment variable or global constant before Nokogiri is loaded).

The second risk exists because a NodeSet may now contain nodes from many documents, and the highly-connected DOM graph may then mean that many unused objects would be prevented from being GCed. This perhaps shouldn't be surprising to anyone who's thought deeply about directed graphs.

@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Feb 1, 2021
@flavorjones flavorjones added this to the v2.0.0 milestone Feb 1, 2021
@flavorjones
Copy link
Member Author

I spent a little bit of time spiking on this today and got pretty far on getting the test suite to pass. I haven't turned on valgrind checks yet, but I'm optimistic that this might be easier than I suspected.

@flavorjones
Copy link
Member Author

Notes to self: work needed on the branch:

  • document NodeSet
  • delete C code
  • explore JRuby code deletion
  • memory: test with valgrind/memcheck
  • memory: think hard about leak scenarios (in the leak test suite maybe)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

No branches or pull requests

1 participant