Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow NodeSet to contain Nodes from multiple Documents (fix for #1952) #2186

Merged
merged 9 commits into from Feb 5, 2021

Conversation

flavorjones
Copy link
Member

What problem is this PR intended to solve?

fix: allow NodeSets to contain Nodes from multiple Documents

Specifically, this means we've added a mark callback for the NodeSet which marks each of the contained objects.

Previously we skipped explicitly marking the contained objects due to the assumption that all the nodes would be from the same document as the NodeSet itself.

Fixes #1952.

Have you included adequate test coverage?

Yes, additional tests created.

Does this change affect the behavior of either the C or the Java implementations?

This change allows the C implementation to do what the Java implementation has always allowed.

@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Feb 2, 2021
@flavorjones
Copy link
Member Author

The failing test at https://ci.nokogiri.org/builds/14968#L60108339:410 looks legit, and so I'll want to dig in a bit more to understand that segfault before I'm happy with this change.

@flavorjones
Copy link
Member Author

OK, I understand why this segfaulted and have a fix, I only lack the time to polish and push it.

@flavorjones
Copy link
Member Author

flavorjones commented Feb 4, 2021

Note to self, next steps:

  • see if we can get this to reliably fail in CI by doing GC.start after every test
  • ship the segfault fix to the PR and CI it
  • can we remove @namespace_cache from NodeSet now that we have a real mark function?
    • make sure that we have a GC test for namespaces and that we run it ...

@flavorjones flavorjones force-pushed the 1952-node-set-segfault branch 3 times, most recently from 41886d9 to e461142 Compare February 5, 2021 04:10
in an attempt to more reliably trigger memory bugs when they exist.

This behavior can be changed by setting the env var
NOKOGIRI_TEST_GC_SETTING to "none", "minor", or "stress"
after being removed when I ripped out Hoe and Hoe::Debugging
Specifically, this means we've added a mark callback for the NodeSet
which marks each of the contained objects.

Previously we skipped explicitly marking the contained objects due to
the assumption that all the nodes would be from the same document as
the NodeSet itself.

Fixes #1952.
Previously, we created Ruby objects for namespaces and used a
@namespace_cache attribute (an Array) to maintain references to
just those Ruby objects.

Now that we have a real GC mark function for NodeSet, let's use that
instead of the hacky array attribute.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

segfault in node_set.rb
1 participant