Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compaction on nodes #2579

Merged
merged 6 commits into from Jul 12, 2022
Merged

Fix compaction on nodes #2579

merged 6 commits into from Jul 12, 2022

Conversation

tenderlove
Copy link
Member

This PR is an attempt to fix compaction issues from #2578. We've split the PR in to 3 commits.

The first commit consistently sets a mark function on the xml node. We need this because we need to provide a static struct to TypedData_Wrap_Struct.

The second commit introduces a helper function for unwrapping xml nodes. Unfortunately, xml nodes use a union type, and subclasses want to unwrap the type in to different things, so we had to extract a function that would unwrap the nodes correctly.

The third commit actually fixes compaction by adding a callback for updating references.

Fixes #2578

@flavorjones
Copy link
Member

Note: this should probably be backported to v1.13.x in a patch release. I'll do that in a separate PR once this is on main.

@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Jun 17, 2022
ext/nokogiri/xml_node.c Outdated Show resolved Hide resolved
ext/nokogiri/xml_text.c Outdated Show resolved Hide resolved
@flavorjones
Copy link
Member

Sorry for being slow to merge this. I will work on it (and the backport?) this week.

Copy link
Contributor

@stevecheckoway stevecheckoway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. You should even be able to get rid of the casts, but they're not doing any harm.

Edit: I meant the approval to mean the updated callback signatures looked good. I did notice one of the tests shows a memory leak reported by Valgrind.

Unconditionally set a mark function on the node wrapper.  We will later
refactor this to use TypedData_Wrap_Struct and we don't want to
conditionally set mark functions on that

Co-Authored-By: Mike Dalessio <mike.dalessio@gmail.com>
@flavorjones
Copy link
Member

I rebased and made some minor changes.

@flavorjones
Copy link
Member

flavorjones commented Jul 10, 2022

I've added a commit 1c74ba1, which I would like to squash, which uses a macro Noko_Node_Get_Struct to replace the previous Data_Get_Struct macro and noko_xml_node_unwrap() function.

This results in a smaller diff, since Noko_Node_Get_Struct has the same parameters as Data_Get_Struct. It should apply cleanly to the v1.13.x branch when I backport.

The resulting combined diff is here: https://github.com/sparklemotion/nokogiri/compare/b370fbd290343e93f20e6bf12522c5ba3f2f5caf..1c74ba18fccc4e809d62c62f2f8ee536aae3ee94

@tenderlove @eightbitraptor would love a quick review when you have a moment.

@flavorjones
Copy link
Member

Added one more commit onto the end to remove RBIMPL_CAST which blew up on windows.

@flavorjones
Copy link
Member

flavorjones commented Jul 10, 2022

@tenderlove I've got one last question for you, which is "why are we using DATA_PTR instead of TypedData_Get_Struct?"

I think I know the answer (but would like confirmation): the alternative of using TypedData_Get_Struct would require creating a new rb_data_type_t for each of the "subclasses" of struct xmlNode, and that's a lot of code for little gain; and DATA_PTR has the advantage of being much less code, and the lack of type safety is OK because libxml2 seems to take care of this reasonably well.

Does that sound about right?

@flavorjones
Copy link
Member

The memcheck failure is because ruby_memcheck needs to be told to ignore memory allocated by rb_ary_push, will take care of that tomorrow.

@tenderlove
Copy link
Member Author

tenderlove commented Jul 11, 2022

Does that sound about right?

Yes, about.

The problem I found is that we have so many objects that inherit from Node. In order to unwrap the object in a method on node, we'd have to check the type of self on every call in order to understand which struct we need to pass to TypedData_Get_Struct.

I'd prefer to use the "typed" unwrappers everywhere, but I couldn't figure out a "proper" way to do that with C objects that use inheritance (without adding tons of checks).

@flavorjones
Copy link
Member

@tenderlove Thanks. This is going to go green so I'm going to clean up commit history (squash some of my changes) and let it go through CI one more time.

tenderlove and others added 4 commits July 11, 2022 11:48
This commit introduces a `Noko_Node_Get_Struct` macro for unwrapping
the underlying data pointer from Nokogiri objects. Then we change all
places that unwrap nodes to use the new macro. We also converted
xmlNode to use `TypedData_Wrap_Struct` so we can provide a compaction
function.

Note that we use DATA_PTR instead of "typed" unwrappers everywhere
because the large number of C structs that inherit from xmlNode would
make the unwrapping code unwieldy and type-check-heavy.

Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
Co-Authored-By: Mike Dalessio <mike.dalessio@gmail.com>
This fixes the case where nodes moves and we need to update the private
references.

Co-Authored-By: Matt Valentine-House <matt@eightbitraptor.com>
Co-Authored-By: Mike Dalessio <mike.dalessio@gmail.com>
Co-Authored-By: Mike Dalessio <mike.dalessio@gmail.com>
We don't have `rb_gc_location` everywhere, so check for it in the
extconf and then conditionally set the compaction callback

Co-Authored-By: Mike Dalessio <mike.dalessio@gmail.com>
for warnings triggered by the compaction test.

- for memory allocated in ary_heap_realloc
- for memory added to the mark stack in stack_chunk_alloc
@flavorjones flavorjones merged commit 82ff9e9 into main Jul 12, 2022
@flavorjones flavorjones deleted the fix-compaction-on-nodes branch July 12, 2022 11:16
flavorjones added a commit to flavorjones/nokogiri-xmlsec-instructure that referenced this pull request Jul 13, 2022
which changed the typed-data semantics around xmlNode in order to
fully support compaction

see sparklemotion/nokogiri#2579
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Document / Nodes aren't compaction friendly
3 participants