Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when using Nokogiri::XML::Builder with XML namespaces #1771

Closed
paddor opened this issue Jun 21, 2018 · 5 comments
Closed

Memory leak when using Nokogiri::XML::Builder with XML namespaces #1771

paddor opened this issue Jun 21, 2018 · 5 comments
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Milestone

Comments

@paddor
Copy link

paddor commented Jun 21, 2018

What problems are you experiencing?
Memory usage (VSZ and RSS) grows uncontrollably when using Nokogiri::XML::Builder with XML namespaces.

What's the output from nokogiri -v?

# Nokogiri (1.8.3)
    ---
    warnings: []
    nokogiri: 1.8.3
    ruby:
      version: 2.5.1
      platform: x86_64-linux
      description: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/ports/x86_64-pc-linux-gnu/libxml2/2.9.8"
      libxslt_path: "/home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/ports/x86_64-pc-linux-gnu/libxslt/1.1.32"
      libxml2_patches:
      - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
      libxslt_patches: []
      compiled: 2.9.8
      loaded: 2.9.8

Can you provide a self-contained script that reproduces what you're seeing?

loop do
  Nokogiri::XML::Builder.new do |xml|
    xml.send 'env:Envelope', env: 'http://schemas.xmlsoap.org/soap/envelope/'
  end
end

In fact, even the following snippet is enough to observe the memory leak, although arguably incomplete without the namespace declaration:

loop do
  Nokogiri::XML::Builder.new do |xml|
    xml.send 'env:Envelope'
  end
end

The memory leak seems to be provoked solely by specifying a namespace prefix in the element name (here env:).

Here's a script doing essentially the same but with more informational output:

require 'nokogiri'
require 'awesome_print'

puts "Nokogiri version: #{Nokogiri::VERSION}"

def count_instances
  objs = Hash.new(0)
  ObjectSpace.each_object { |o| objs[o.class] += 1 }
  puts "Top 5 instances:"
  ap Hash[objs.sort_by{ |_, v| -v }.take(5)]
end

def obj_stat
  h = ObjectSpace.count_objects
  # puts "Total object count:\n#{h.ai}"
  puts "Total object count:\t#{h[:TOTAL] - h[:FREE]}"
  puts "heap_live_slots:\t#{GC.stat[:heap_live_slots]}"
end

def utilization
  puts "\nVSZ/RSS:\t\t#{`ps -o vsz,rss -p #{Process.pid} -h`}"
  obj_stat
  count_instances
end


100.times do
  5_000.times do
    Nokogiri::XML::Builder.new do |xml|
      xml.send 'env:Envelope'
    end
  end
  GC.start
  utilization
end

The output also confirms that it's not Ruby objects being leaked, as the number of Ruby objects does not change. Only VSZ and RSS grow.

According to Valgrind, most memory definitely lost is allocated in a function called xmlStrndup() in nokogiri.so:

==10000== 1,999,888 bytes in 499,972 blocks are definitely lost in loss record 11,339 of 11,340
==10000==    at 0x4C2E0D6: malloc (vg_replace_malloc.c:299)
==10000==    by 0x4F0B953: objspace_xmalloc0 (gc.c:7921)
==10000==    by 0x8953695: xmlStrndup (in /home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/lib/nokogiri/nokogiri.so)
==10000==    by 0x88D49F6: xmlSplitQName2 (in /home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/lib/nokogiri/nokogiri.so)
==10000==    by 0x88891DD: relink_namespace (xml_node.c:41)
==10000==    by 0x8889A37: reparent_node_with (xml_node.c:320)
==10000==    by 0x8889A37: add_child (xml_node.c:1142)
==10000==    by 0x504F345: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
==10000==    by 0x504F345: vm_call_cfunc (vm_insnhelper.c:1934)
==10000==    by 0x505AF88: vm_exec_core (insns.def:915)
==10000==    by 0x505EA26: vm_exec (vm.c:1778)
==10000==    by 0x5063616: vm_call0_body.constprop.141 (vm_eval.c:127)
==10000==    by 0x5063F73: vm_call0 (vm_eval.c:58)
==10000==    by 0x5063F73: rb_call0 (vm_eval.c:296)
==10000==    by 0x506470D: rb_call (vm_eval.c:589)
==10000==    by 0x506470D: rb_funcallv (vm_eval.c:815)
==10000== 
==10000== 4,499,847 bytes in 499,983 blocks are definitely lost in loss record 11,340 of 11,340
==10000==    at 0x4C2E0D6: malloc (vg_replace_malloc.c:299)
==10000==    by 0x4F0B953: objspace_xmalloc0 (gc.c:7921)
==10000==    by 0x8953695: xmlStrndup (in /home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/lib/nokogiri/nokogiri.so)
==10000==    by 0x8953735: xmlStrdup (in /home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/lib/nokogiri/nokogiri.so)
==10000==    by 0x88D4A37: xmlSplitQName2 (in /home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.3/lib/nokogiri/nokogiri.so)
==10000==    by 0x88891DD: relink_namespace (xml_node.c:41)
==10000==    by 0x8889A37: reparent_node_with (xml_node.c:320)
==10000==    by 0x8889A37: add_child (xml_node.c:1142)
==10000==    by 0x504F345: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
==10000==    by 0x504F345: vm_call_cfunc (vm_insnhelper.c:1934)
==10000==    by 0x505AF88: vm_exec_core (insns.def:915)
==10000==    by 0x505EA26: vm_exec (vm.c:1778)
==10000==    by 0x5063616: vm_call0_body.constprop.141 (vm_eval.c:127)
==10000==    by 0x5063F73: vm_call0 (vm_eval.c:58)
==10000==    by 0x5063F73: rb_call0 (vm_eval.c:296)

My wild guess is that the strings returned by xmlStrndup() should have been freed by relink_namespace() in xml_node.c, but I'm no expert in C.

@flavorjones
Copy link
Member

Thanks for reporting. I'll investigate this weekend.

@flavorjones
Copy link
Member

I've reproduced this memory leak. Looking now!

@flavorjones flavorjones added this to the 1.8.4 milestone Jun 30, 2018
@flavorjones
Copy link
Member

OK, I believe I've fixed this. Will be in 1.8.4 presuming the builds go green.

Thanks so much for the bug report, and the thorough and complete analysis that you did! Also, thanks for using the bug-report template.

♥ ♥ ♥

@paddor
Copy link
Author

paddor commented Jun 30, 2018

Thanks for the fix! 👍

@arohr
Copy link

arohr commented Jul 13, 2018

We also found that the leak can be avoided in versions < 1.8.4 when setting the namespace like this
xml[:env].send 'Envelope', 'xmlns:env': 'http://schemas.xmlsoap.org/soap/envelope/' instead of xml.send 'env:Envelope'.

swills pushed a commit to swills/gitlabhq that referenced this issue Aug 1, 2018
sanitize 4.6.6 has this optimization that will benefit Markdown rendering: rgrove/sanitize#183

nokogiri 1.4.4 has this memory leak fix: sparklemotion/nokogiri#1771
jameszhan pushed a commit to jameszhan/gitaly that referenced this issue Nov 13, 2018
sanitize 4.6.6 has this optimization that will benefit Markdown rendering: rgrove/sanitize#183

nokogiri 1.4.4 has this memory leak fix: sparklemotion/nokogiri#1771

This mirrors changes for https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20795.
@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

No branches or pull requests

3 participants