Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: XML::Reader XML::Attr garbage collection #2599

Merged
merged 4 commits into from Jul 22, 2022

Conversation

flavorjones
Copy link
Member

@flavorjones flavorjones commented Jul 21, 2022

What problem is this PR intended to solve?

This is a proposed fix for #2598, see that issue for an extended explanation of the problem.

This PR implements "option 2" from that issue's proposed solutions:

  • introduce a new Reader#attribute_hash that will return a Hash<String ⇒ String> (instead of an Array<XML::Attr>)
  • deprecate Reader#attribute_nodes with a plan to remove it entirely in a future release
  • re-implement Reader#attributes to use #attribute_hash (instead of #attribute_nodes)

After this change, only applications calling Reader#attribute_nodes directly will be running the unsafe code. These users will see a deprecation warning and may use #attribute_hash as a replacement.

I think it's very possible that Reader#attribute_hash won't meet the needs of people who are working with namespaced attributes and are using #attribute_nodes for this purpose. However, I'm intentionally deferring any attempt to solve that DX problem until someone who needs this functionality asks for it.

Have you included adequate test coverage?

I tried and failed to add test coverage to the suite that would reproduce the underlying GC bug.

However, existing test coverage of Reader#attributes is sufficient for now.

Does this change affect the behavior of either the C or the Java implementations?

This PR modifies both the C and Java implementations to behave the same.

Notably, the Java implementation contains a small bugfix which is that Reader#namespaces now returns an empty hash when there are no namespaces (it previously returned nil).

@flavorjones flavorjones force-pushed the flavorjones-fix-reader-node-gc branch 4 times, most recently from 945f617 to 42dd5bb Compare July 21, 2022 18:27
@flavorjones flavorjones marked this pull request as ready for review July 21, 2022 18:27
which is now being used to implement Reader#attributes

fix: Reader#namespaces on JRuby now returns an empty hash when no
attributes are present (previously returned nil)
which will use rb_category_warning if it's available, and rb_warning
if it's not
This method will be removed in a future version
@flavorjones flavorjones force-pushed the flavorjones-fix-reader-node-gc branch from 42dd5bb to 113440c Compare July 21, 2022 20:58
@flavorjones flavorjones merged commit 685a940 into main Jul 22, 2022
@flavorjones flavorjones deleted the flavorjones-fix-reader-node-gc branch July 22, 2022 19:57
@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Jul 24, 2022
flavorjones added a commit that referenced this pull request Nov 28, 2023
**What problem is this PR intended to solve?**

Before a minor release, I generally review deprecations and look for
things we can remove.

* Removed `Nokogiri::HTML5.get` which was deprecated in v1.12.0. [#2278]
(@flavorjones)
* Removed the CSS-to-XPath utility modules
`XPathVisitorAlwaysUseBuiltins` and `XPathVisitorOptimallyUseBuiltins`,
which were deprecated in v1.13.0 in favor of `XPathVisitor` constructor
args. [#2403] (@flavorjones)
* Removed `XML::Reader#attribute_nodes` which was deprecated in v1.13.8
in favor of `#attribute_hash`. [#2598, #2599] (@flavorjones)

Also we're now specifying version numbers in remaining deprecation
warnings.

**Have you included adequate test coverage?**

Tests have been removed, otherwise no new coverage needed.

**Does this change affect the behavior of either the C or the Java
implementations?**

As documented above.
@ccutrer
Copy link
Contributor

ccutrer commented Jan 17, 2024

However, I'm intentionally deferring any attempt to solve that DX problem until someone who needs this functionality asks for it.

:sigh: here I am. https://github.com/instructure/ratom/blob/v0.10.11/lib/atom/xml/parser.rb#L103. I'm not the original author of this code, so I'm not intimately familiar with it. But it sure seems like proper namespace handling is imperative for this code. It needs the attribute name in all forms: with the namespace prefix, and with the namespace href. It doesn't hang on to the attribute node though, so copying out just those values and not having a full xmlAttrPtr is just fine.

@flavorjones
Copy link
Member Author

@ccutrer Thanks for the note. I created a new issue, let's continue the conversation at #3102!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants