Closed
Description
What problems are you experiencing?
Maybe related or similar to #1771. Memory leak when using Nokogiri::XML::Builder
and namespaces. Only namespace definitions beginning with xmlns:
cause the leak (because Nokogiri::XML::Document#create_element is implemented that way), and nesting seems to be important too. I guess Nokogiri::XML::Node#add_namespace_definition doesn't free that definition when the node is garbage collected.
What's the output from nokogiri -v
?
# Nokogiri (1.8.5)
---
warnings: []
nokogiri: 1.8.5
ruby:
version: 2.5.1
platform: x86_64-linux
description: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
engine: ruby
libxml:
binding: extension
source: packaged
libxml2_path: "/home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.5/ports/x86_64-pc-linux-gnu/libxml2/2.9.8"
libxslt_path: "/home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.5/ports/x86_64-pc-linux-gnu/libxslt/1.1.32"
libxml2_patches:
- 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
- 0002-Fix-nullptr-deref-with-XPath-logic-ops.patch
- 0003-Fix-infinite-loop-in-LZMA-decompression.patch
libxslt_patches: []
compiled: 2.9.8
loaded: 2.9.8
Can you provide a self-contained script that reproduces what you're seeing?
require 'nokogiri'
puts "Nokogiri version: #{Nokogiri::VERSION}"
puts "PID: #{$$}"
NS = {
'xmlns:env': 'http://schemas.xmlsoap.org/soap/envelope/'
}
100.times do |i|
puts "##{i}"
5_000.times do
Nokogiri::XML::Builder.new do |xml|
#xml.send 'env:Envelope'
xml.send 'Envelope', nil, NS do |xml|
xml.send 'Foobar', nil, NS # no leak without this inner call
end
end
end
GC.start
end
When I run this script, its RSS grows continually from ~12 MB up to 60 MB before it exits.
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
paddor commentedon Oct 31, 2018
The script above is flawed, as the elements don't actually reference a namespace. Use this script instead, which makes use of the default namespace and shows the same memory leak:
Output on my machine:
I'm currently running it again on Valgrind. Will post the results as soon as it's finished.
paddor commentedon Nov 1, 2018
Here are the few largest
definitely lost
sections related to Nokogiri (largest first):It looks like memory is being leaked at two locations.
flavorjones commentedon Nov 2, 2018
Hi, thanks for reporting this, and apologies for the delay in replying. I've been getting false positives from valgrind in nokogiri's CI pipeline and needed to figure out how to suppress those before digging into this report.
I'll take a look this weekend.
flavorjones commentedon Nov 2, 2018
Also - thank you for using the issue-reporting template, and for providing such clear information about the leak. You've helped a lot!
paddor commentedon Nov 2, 2018
No worries. I've actually done some heap profiling with Massif:
Here's the peak snapshot (but it looks similarly at every snapshot):
As far as I can see, there seem to be two issues:
The same namespace is defined again and again. It seems add_namespace_definition() fails to find the already defined namespace.
The many many defined namespaces aren't freed when the document is garbage collected. In Nokogiri_wrap_xml_namespace(), there are three calls to Ruby's
Data_Wrap_Struct(VALUE klass, void (*mark)(), void (*free)(), void *sval)
, but only one of them actually passes a pointer to the deallocator function.flavorjones commentedon Nov 3, 2018
OK, I think I've nailed the root cause down. Running your script before the fix:
and with a patch applied
and valgrind no longer dumps out any lost memory associated with allocations having
xmlNewNs
in their stack.Going to clean the code up a bit before committing, should be able to cut a release this weekend.
tests for memory leak in issue #1810
fix for memory leak in issue #1810
tests for memory leak in issue #1810
fix for memory leak in issue #1810
paddor commentedon Nov 4, 2018
Just checked and saw the commits. Thanks for the effort!
Does this also fix the problem of excessively defined namespaces due to the inability to find existing definitions? Or was that a non-issue maybe?
flavorjones commentedon Nov 4, 2018
@paddor the "excessively defined namespaces" is an artifact of how the builder is implemented: each node is created (with namespaces) before it is added to the document tree, so there's no way to know (without a code design change) where to search in the document for a relevant existing namespace definition. As a result, the Builder (actually it's
Document#create_element
) is forced to create a new ns definition on each (temporarily) orphan node.I'm open to a pull request that addresses the unnecessary creation of namespace definitions, but at this point it's an optimization and is thus unlikely to get my attention anytime soon.
One further note: once the node is added to a document, Nokogiri will search the ancestors for a relevant pre-existing nsDef and if one exists will remove the duplicate from the reparented node. (More exactly, this node is parked in an "unlinked nodes" hash owned by the Document rather than being immediately freed because it's possible for references to exist. (This is pre-existing code.)) The changeset in #1815 introduces better behavior, to make sure that all references to the removed nsDef are repointed to the correct nsDef hanging off the ancestor element. I may play around with whether this means it's not possible to have any hanging references to it, in which case we may be able to free the nsDef up immediately. Need to think about this and try to break it.
paddor commentedon Nov 4, 2018
Interesting to know that every new node is created without an associated document! (Not an issue though, as long as no memory is leaked.) So, in the example script, as soon as the new node is integrated into the document (reparented), the duplicate namespace definition is removed? Pretty cool. 👍
flavorjones commentedon Nov 5, 2018
@paddor Node are always created with an associated document; but they're created with an associated parent. Because there's no parent, there's no where to look for an existing matching namespace.
I explored freeing this nsDef but determined it's not safe, and added a test demonstrating why it's not safe in 38a28fe.
13 remaining items