Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when using Nokogiri::XML::Builder with XML namespaces in nested elements #1810

Closed
paddor opened this issue Oct 29, 2018 · 14 comments
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Milestone

Comments

@paddor
Copy link

paddor commented Oct 29, 2018

What problems are you experiencing?
Maybe related or similar to #1771. Memory leak when using Nokogiri::XML::Builder and namespaces. Only namespace definitions beginning with xmlns: cause the leak (because Nokogiri::XML::Document#create_element is implemented that way), and nesting seems to be important too. I guess Nokogiri::XML::Node#add_namespace_definition doesn't free that definition when the node is garbage collected.

What's the output from nokogiri -v?

# Nokogiri (1.8.5)
    ---
    warnings: []
    nokogiri: 1.8.5
    ruby:
      version: 2.5.1
      platform: x86_64-linux
      description: ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.5/ports/x86_64-pc-linux-gnu/libxml2/2.9.8"
      libxslt_path: "/home/p/.gem/ruby/2.5.1/gems/nokogiri-1.8.5/ports/x86_64-pc-linux-gnu/libxslt/1.1.32"
      libxml2_patches:
      - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
      - 0002-Fix-nullptr-deref-with-XPath-logic-ops.patch
      - 0003-Fix-infinite-loop-in-LZMA-decompression.patch
      libxslt_patches: []
      compiled: 2.9.8
      loaded: 2.9.8

Can you provide a self-contained script that reproduces what you're seeing?

require 'nokogiri'

puts "Nokogiri version: #{Nokogiri::VERSION}"
puts "PID: #{$$}"

NS = {
  'xmlns:env': 'http://schemas.xmlsoap.org/soap/envelope/'
}

100.times do |i|
  puts "##{i}"
  5_000.times do
    Nokogiri::XML::Builder.new do |xml|
      #xml.send 'env:Envelope'
      xml.send 'Envelope', nil, NS do |xml|
        xml.send 'Foobar', nil, NS # no leak without this inner call
      end
    end
  end
  GC.start
end

When I run this script, its RSS grows continually from ~12 MB up to 60 MB before it exits.

@paddor
Copy link
Author

paddor commented Oct 31, 2018

The script above is flawed, as the elements don't actually reference a namespace. Use this script instead, which makes use of the default namespace and shows the same memory leak:

require 'nokogiri'

NS = {
  'xmlns': 'http://schemas.xmlsoap.org/soap/envelope/',
}

100.times do |i|
  b = nil

  5_000.times do
    b = Nokogiri::XML::Builder.new do |xml|
      xml.send 'Envelope', nil, NS do |xml|
        xml.send 'Foobar', nil, NS
      end
    end
  end

  GC.start

  puts b.to_xml if i == 0
  puts "##{i}\tVSZ/RSS:\t#{`ps -o vsz,rss -p #{Process.pid} -h`}"
end

Output on my machine:

<?xml version="1.0"?>
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
  <Foobar/>
</Envelope>
#0	VSZ/RSS:	 49056 14288
#1	VSZ/RSS:	 49452 14936
#2	VSZ/RSS:	 49720 15200
#3	VSZ/RSS:	 50124 15464
#4	VSZ/RSS:	 50520 15992
#5	VSZ/RSS:	 50784 16256
#6	VSZ/RSS:	 51184 16520
#7	VSZ/RSS:	 51452 17048
#8	VSZ/RSS:	 51848 17312
#9	VSZ/RSS:	 52248 17576
#10	VSZ/RSS:	 52512 18104
#11	VSZ/RSS:	 52912 18368
#12	VSZ/RSS:	 53312 18632
#13	VSZ/RSS:	 53576 19160
#14	VSZ/RSS:	 53980 19424
#15	VSZ/RSS:	 54384 19688
#16	VSZ/RSS:	 54652 20216
#17	VSZ/RSS:	 55056 20480
#18	VSZ/RSS:	 55456 20744
#19	VSZ/RSS:	 55720 21272
#20	VSZ/RSS:	 56124 21536
#21	VSZ/RSS:	 56528 22064
#22	VSZ/RSS:	 56796 22328
#23	VSZ/RSS:	 57196 22592
#24	VSZ/RSS:	 57464 22856
#25	VSZ/RSS:	 57860 23384
#26	VSZ/RSS:	 58260 23648
#27	VSZ/RSS:	 58528 23912
#28	VSZ/RSS:	 58932 24440
#29	VSZ/RSS:	 59200 24704
#30	VSZ/RSS:	 59604 24968
#31	VSZ/RSS:	 60004 25496
#32	VSZ/RSS:	 60400 25760
#33	VSZ/RSS:	 60664 26024
#34	VSZ/RSS:	 61064 26552
#35	VSZ/RSS:	 61460 26816
#36	VSZ/RSS:	 61728 27080
#37	VSZ/RSS:	 62124 27608
#38	VSZ/RSS:	 62524 27872
#39	VSZ/RSS:	 62788 28400
#40	VSZ/RSS:	 63184 28664
#41	VSZ/RSS:	 63584 28928
#42	VSZ/RSS:	 63848 29456
#43	VSZ/RSS:	 64252 29720
#44	VSZ/RSS:	 64520 29984
#45	VSZ/RSS:	 64916 30512
#46	VSZ/RSS:	 65312 30776
#47	VSZ/RSS:	 65576 31040
#48	VSZ/RSS:	 65976 31568
#49	VSZ/RSS:	 66372 31832
#50	VSZ/RSS:	 66636 32096
#51	VSZ/RSS:	 67044 32360
#52	VSZ/RSS:	 67440 32888
#53	VSZ/RSS:	 67704 33152
#54	VSZ/RSS:	 68108 33680
#55	VSZ/RSS:	 68380 33944
#56	VSZ/RSS:	 68788 34208
#57	VSZ/RSS:	 69188 34736
#58	VSZ/RSS:	 69456 35000
#59	VSZ/RSS:	 69856 35264
#60	VSZ/RSS:	 70256 35792
#61	VSZ/RSS:	 70524 36056
#62	VSZ/RSS:	 70924 36320
#63	VSZ/RSS:	 71328 36848
#64	VSZ/RSS:	 71596 37112
#65	VSZ/RSS:	 71992 37376
#66	VSZ/RSS:	 72388 37904
#67	VSZ/RSS:	 72652 38168
#68	VSZ/RSS:	 73048 38432
#69	VSZ/RSS:	 73444 38960
#70	VSZ/RSS:	 73716 39224
#71	VSZ/RSS:	 74116 39488
#72	VSZ/RSS:	 74516 40016
#73	VSZ/RSS:	 74780 40280
#74	VSZ/RSS:	 75176 40544
#75	VSZ/RSS:	 75444 41072
#76	VSZ/RSS:	 75852 41336
#77	VSZ/RSS:	 76248 41600
#78	VSZ/RSS:	 76516 42128
#79	VSZ/RSS:	 76916 42392
#80	VSZ/RSS:	 77312 42656
#81	VSZ/RSS:	 77576 43184
#82	VSZ/RSS:	 77976 43448
#83	VSZ/RSS:	 78372 43712
#84	VSZ/RSS:	 78640 44240
#85	VSZ/RSS:	 79040 44504
#86	VSZ/RSS:	 79440 44768
#87	VSZ/RSS:	 79708 45296
#88	VSZ/RSS:	 80112 45560
#89	VSZ/RSS:	 80512 45824
#90	VSZ/RSS:	 80776 46352
#91	VSZ/RSS:	 81176 46616
#92	VSZ/RSS:	 81444 46880
#93	VSZ/RSS:	 81844 47408
#94	VSZ/RSS:	 82244 47672
#95	VSZ/RSS:	 82644 47936
#96	VSZ/RSS:	 82908 48464
#97	VSZ/RSS:	 83308 48728
#98	VSZ/RSS:	 83576 48992
#99	VSZ/RSS:	 83976 49520

I'm currently running it again on Valgrind. Will post the results as soon as it's finished.

@paddor
Copy link
Author

paddor commented Nov 1, 2018

Here are the few largest definitely lost sections related to Nokogiri (largest first):

==4581== 20,995,506 bytes in 499,893 blocks are definitely lost in loss record 8,434 of 8,434
==4581==    at 0x40307FF: malloc (in /home/p/.linuxbrew/Cellar/valgrind/3.14.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4581==    by 0x4AED402: objspace_xmalloc0 (gc.c:7921)
==4581==    by 0x4AED667: ruby_xmalloc0 (gc.c:7989)
==4581==    by 0x4AED696: ruby_xmalloc (gc.c:7998)
==4581==    by 0x903AEF8: xmlStrndup (xmlstring.c:45)
==4581==    by 0x903AF98: xmlStrdup (xmlstring.c:71)
==4581==    by 0x8FBA419: xmlNewNs (tree.c:766)
==4581==    by 0x8F6AB59: add_namespace_definition (xml_node.c:1327)
==4581==    by 0x4C3D4AE: call_cfunc_2 (vm_insnhelper.c:1741)
==4581==    by 0x4C3DF1C: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
==4581==    by 0x4C3E06B: vm_call_cfunc (vm_insnhelper.c:1934)
==4581==    by 0x4C45B70: vm_exec_core (insns.def:915)
==4581== 
==4581== 33,332 (40 direct, 33,292 indirect) bytes in 1 blocks are definitely lost in loss record 8,382 of 8,434
==4581==    at 0x40307FF: malloc (in /home/p/.linuxbrew/Cellar/valgrind/3.14.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4581==    by 0x4AED402: objspace_xmalloc0 (gc.c:7921)
==4581==    by 0x4AED667: ruby_xmalloc0 (gc.c:7989)
==4581==    by 0x4AED696: ruby_xmalloc (gc.c:7998)
==4581==    by 0x8F88AA7: xmlFindCharEncodingHandler (encoding.c:1700)
==4581==    by 0x8F692DE: get (xml_encoding_handler.c:12)
==4581==    by 0x4C3D471: call_cfunc_1 (vm_insnhelper.c:1735)
==4581==    by 0x4C3DF1C: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
==4581==    by 0x4C3E06B: vm_call_cfunc (vm_insnhelper.c:1934)
==4581==    by 0x4C3F2AD: vm_call_method_each_type (vm_insnhelper.c:2232)
==4581==    by 0x4C3F93A: vm_call_method (vm_insnhelper.c:2355)
==4581==    by 0x4C3FB10: vm_call_general (vm_insnhelper.c:2398)
==4581== 
==4581== 32,640 bytes in 1 blocks are indirectly lost in loss record 8,375 of 8,434
==4581==    at 0x40307FF: malloc (in /home/p/.linuxbrew/Cellar/valgrind/3.14.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4581==    by 0x4D8944B: __gconv_open (gconv_open.c:164)
==4581==    by 0x4D88E29: iconv_open (iconv_open.c:71)
==4581==    by 0x8F88A0E: xmlFindCharEncodingHandler (encoding.c:1691)
==4581==    by 0x8F692DE: get (xml_encoding_handler.c:12)
==4581==    by 0x4C3D471: call_cfunc_1 (vm_insnhelper.c:1735)
==4581==    by 0x4C3DF1C: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
==4581==    by 0x4C3E06B: vm_call_cfunc (vm_insnhelper.c:1934)
==4581==    by 0x4C3F2AD: vm_call_method_each_type (vm_insnhelper.c:2232)
==4581==    by 0x4C3F93A: vm_call_method (vm_insnhelper.c:2355)
==4581==    by 0x4C3FB10: vm_call_general (vm_insnhelper.c:2398)
==4581==    by 0x4C47BCD: vm_exec_core (insns.def:1441)
==4581== 32,640 bytes in 1 blocks are possibly lost in loss record 8,374 of 8,434
==4581==    at 0x40307FF: malloc (in /home/p/.linuxbrew/Cellar/valgrind/3.14.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4581==    by 0x4D8944B: __gconv_open (gconv_open.c:164)
==4581==    by 0x4D88E29: iconv_open (iconv_open.c:71)
==4581==    by 0x8F88A2B: xmlFindCharEncodingHandler (encoding.c:1692)
==4581==    by 0x8F692DE: get (xml_encoding_handler.c:12)
==4581==    by 0x4C3D471: call_cfunc_1 (vm_insnhelper.c:1735)
==4581==    by 0x4C3DF1C: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
==4581==    by 0x4C3E06B: vm_call_cfunc (vm_insnhelper.c:1934)
==4581==    by 0x4C3F2AD: vm_call_method_each_type (vm_insnhelper.c:2232)
==4581==    by 0x4C3F93A: vm_call_method (vm_insnhelper.c:2355)
==4581==    by 0x4C3FB10: vm_call_general (vm_insnhelper.c:2398)
==4581==    by 0x4C47BCD: vm_exec_core (insns.def:1441)
==4581== 
==4581== 2,268 bytes in 54 blocks are possibly lost in loss record 8,064 of 8,434
==4581==    at 0x40307FF: malloc (in /home/p/.linuxbrew/Cellar/valgrind/3.14.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4581==    by 0x4AED402: objspace_xmalloc0 (gc.c:7921)
==4581==    by 0x4AED667: ruby_xmalloc0 (gc.c:7989)
==4581==    by 0x4AED696: ruby_xmalloc (gc.c:7998)
==4581==    by 0x903AEF8: xmlStrndup (xmlstring.c:45)
==4581==    by 0x903AF98: xmlStrdup (xmlstring.c:71)
==4581==    by 0x8FBA419: xmlNewNs (tree.c:766)
==4581==    by 0x8F6AB59: add_namespace_definition (xml_node.c:1327)
==4581==    by 0x4C3D4AE: call_cfunc_2 (vm_insnhelper.c:1741)
==4581==    by 0x4C3DF1C: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
==4581==    by 0x4C3E06B: vm_call_cfunc (vm_insnhelper.c:1934)
==4581==    by 0x4C45B70: vm_exec_core (insns.def:915)
==4581== 
==4581== 1,638 bytes in 39 blocks are indirectly lost in loss record 7,975 of 8,434
==4581==    at 0x40307FF: malloc (in /home/p/.linuxbrew/Cellar/valgrind/3.14.0/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4581==    by 0x4AED402: objspace_xmalloc0 (gc.c:7921)
==4581==    by 0x4AED667: ruby_xmalloc0 (gc.c:7989)
==4581==    by 0x4AED696: ruby_xmalloc (gc.c:7998)
==4581==    by 0x903AEF8: xmlStrndup (xmlstring.c:45)
==4581==    by 0x903AF98: xmlStrdup (xmlstring.c:71)
==4581==    by 0x8FBA419: xmlNewNs (tree.c:766)
==4581==    by 0x8F6AB59: add_namespace_definition (xml_node.c:1327)
==4581==    by 0x4C3D4AE: call_cfunc_2 (vm_insnhelper.c:1741)
==4581==    by 0x4C3DF1C: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
==4581==    by 0x4C3E06B: vm_call_cfunc (vm_insnhelper.c:1934)
==4581==    by 0x4C45B70: vm_exec_core (insns.def:915)

It looks like memory is being leaked at two locations.

@flavorjones
Copy link
Member

Hi, thanks for reporting this, and apologies for the delay in replying. I've been getting false positives from valgrind in nokogiri's CI pipeline and needed to figure out how to suppress those before digging into this report.

I'll take a look this weekend.

@flavorjones
Copy link
Member

Also - thank you for using the issue-reporting template, and for providing such clear information about the leak. You've helped a lot!

@paddor
Copy link
Author

paddor commented Nov 2, 2018

No worries. I've actually done some heap profiling with Massif:

--------------------------------------------------------------------------------
Command:            ruby issue.rb
Massif arguments:   (none)
ms_print arguments: massif.out.10500
--------------------------------------------------------------------------------


    MB
43.62^                                                                       :
     |                                                                  @:@@@#
     |                                                             @@@@:@:@@@#
     |                                                        :::@@@@@@:@:@@@#
     |                                                  ::@@:::::@ @@@@:@:@@@#
     |                                            :::::@::@ : :::@ @@@@:@:@@@#
     |                                       :::@@::: :@::@ : :::@ @@@@:@:@@@#
     |                                  ::::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     |                           @::@:::: ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     |                      @::@:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     |                 ::::@@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     |            ::::@::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     |      :@@:::::: @::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     | ::::::@ : :::: @::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     | : : ::@ : :::: @::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     | : : ::@ : :::: @::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     | : : ::@ : :::: @::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     | : : ::@ : :::: @::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     | : : ::@ : :::: @::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
     | : : ::@ : :::: @::: @@: @:@: @:: : ::@:: @ ::: :@::@ : :::@ @@@@:@:@@@#
   0 +----------------------------------------------------------------------->Gi
     0                                                                   156.8

Number of snapshots: 66
 Detailed snapshots: [6, 12, 16, 17, 19, 21, 23, 29, 32, 33, 38, 41, 46, 47, 48, 50, 52, 54, 56, 58, 60, 62, 64 (peak)]

Here's the peak snapshot (but it looks similarly at every snapshot):

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 63 166,359,363,759       45,328,056       33,445,272    11,882,784            0
 64 166,550,967,800       45,465,560       33,571,462    11,894,098            0
73.84% (33,571,462B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->61.06% (27,763,179B) 0x4AED401: objspace_xmalloc0 (gc.c:7921)
| ->53.94% (24,522,639B) 0x4AED666: ruby_xmalloc0 (gc.c:7989)
| | ->53.94% (24,522,639B) 0x4AED695: ruby_xmalloc (gc.c:7998)
| |   ->45.68% (20,769,776B) 0x783AEF7: xmlStrndup (xmlstring.c:45)
| |   | ->45.68% (20,769,776B) 0x783AF97: xmlStrdup (xmlstring.c:71)
| |   |   ->45.67% (20,766,312B) 0x77BA418: xmlNewNs (tree.c:766)
| |   |   | ->45.67% (20,766,312B) 0x776AB58: add_namespace_definition (xml_node.c:1327)
| |   |   |   ->45.67% (20,766,312B) 0x4C3D4AD: call_cfunc_2 (vm_insnhelper.c:1741)
| |   |   |     ->45.67% (20,766,312B) 0x4C3DF1B: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
| |   |   |       ->45.67% (20,766,312B) 0x4C3E06A: vm_call_cfunc (vm_insnhelper.c:1934)
| |   |   |         ->45.67% (20,766,312B) 0x4C45B6F: vm_exec_core (insns.def:915)
| |   |   |         | ->45.67% (20,766,312B) 0x4C566BB: vm_exec (vm.c:1778)
| |   |   |         |   ->45.67% (20,766,312B) 0x4C54290: invoke_block (vm.c:979)
| |   |   |         |     ->45.67% (20,766,312B) 0x4C545FD: invoke_iseq_block_from_c (vm.c:1031)
| |   |   |         |       ->45.67% (20,766,312B) 0x4C546BB: invoke_block_from_c_bh (vm.c:1049)
| |   |   |         |         ->45.67% (20,766,312B) 0x4C54867: vm_yield (vm.c:1094)
| |   |   |         |           ->45.67% (20,766,312B) 0x4C502BD: rb_yield_0 (vm_eval.c:970)
| |   |   |         |             ->45.67% (20,766,312B) 0x4C504DC: rb_yield_values2 (vm_eval.c:1016)
| |   |   |         |               ->45.67% (20,766,312B) 0x4AF3AC7: each_pair_i_fast (hash.c:1837)
| |   |   |         |                 ->45.67% (20,766,312B) 0x4AF1964: hash_foreach_iter (hash.c:353)
| |   |   |         |                   ->45.67% (20,766,312B) 0x4BD5E78: st_general_foreach (st.c:1571)
| |   |   |         |                     ->45.67% (20,766,312B) 0x4BD6176: st_foreach_check (st.c:1658)
| |   |   |         |                       ->45.67% (20,766,312B) 0x4AF1A2A: hash_foreach_call (hash.c:386)
| |   |   |         |                         ->45.67% (20,766,312B) 0x4AD211E: rb_ensure (eval.c:1037)
| |   |   |         |                           ->45.67% (20,766,312B) 0x4AF1ABA: rb_hash_foreach (hash.c:403)
| |   |   |         |                             ->45.67% (20,766,312B) 0x4AF3B34: rb_hash_each_pair (hash.c:1868)
| |   |   |         |                               ->45.67% (20,766,312B) 0x4C3D43B: call_cfunc_0 (vm_insnhelper.c:1729)
| |   |   |         |                                 ->45.67% (20,766,312B) 0x4C3DF1B: vm_call_cfunc_with_frame (vm_insnhelper.c:1918)
| |   |   |         |                                   ->45.67% (20,766,312B) 0x4C3E06A: vm_call_cfunc (vm_insnhelper.c:1934)
| |   |   |         |                                     ->45.67% (20,766,312B) 0x4C45795: vm_exec_core (insns.def:850)
| |   |   |         |                                       
| |   |   |         ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| |   |   |         
| |   |   ->00.01% (3,464B) in 1+ places, all below ms_print's threshold (01.00%)
| |   |   

As far as I can see, there seem to be two issues:

  1. The same namespace is defined again and again. It seems add_namespace_definition() fails to find the already defined namespace.

  2. The many many defined namespaces aren't freed when the document is garbage collected. In Nokogiri_wrap_xml_namespace(), there are three calls to Ruby's Data_Wrap_Struct(VALUE klass, void (*mark)(), void (*free)(), void *sval), but only one of them actually passes a pointer to the deallocator function.

@flavorjones
Copy link
Member

OK, I think I've nailed the root cause down. Running your script before the fix:

# v1.8.5
#0	VSZ/RSS:	 66064 14548
...
#9	VSZ/RSS:	 69136 17832
...
#19	VSZ/RSS:	 72608 21264
...
#29	VSZ/RSS:	 76064 24696
...
#39	VSZ/RSS:	 79540 28128
...
#49	VSZ/RSS:	 83012 31824

and with a patch applied

# PATCHED
#0	VSZ/RSS:	 65644 14304
...
#9	VSZ/RSS:	 65644 14688
...
#19	VSZ/RSS:	 65644 14688
...
#29	VSZ/RSS:	 65596 14664
...
#39	VSZ/RSS:	 65596 14664
...
#49	VSZ/RSS:	 65596 14664

and valgrind no longer dumps out any lost memory associated with allocations having xmlNewNs in their stack.

Going to clean the code up a bit before committing, should be able to cut a release this weekend.

@paddor
Copy link
Author

paddor commented Nov 4, 2018

Just checked and saw the commits. Thanks for the effort!

Does this also fix the problem of excessively defined namespaces due to the inability to find existing definitions? Or was that a non-issue maybe?

@flavorjones
Copy link
Member

@paddor the "excessively defined namespaces" is an artifact of how the builder is implemented: each node is created (with namespaces) before it is added to the document tree, so there's no way to know (without a code design change) where to search in the document for a relevant existing namespace definition. As a result, the Builder (actually it's Document#create_element) is forced to create a new ns definition on each (temporarily) orphan node.

I'm open to a pull request that addresses the unnecessary creation of namespace definitions, but at this point it's an optimization and is thus unlikely to get my attention anytime soon.

One further note: once the node is added to a document, Nokogiri will search the ancestors for a relevant pre-existing nsDef and if one exists will remove the duplicate from the reparented node. (More exactly, this node is parked in an "unlinked nodes" hash owned by the Document rather than being immediately freed because it's possible for references to exist. (This is pre-existing code.)) The changeset in #1815 introduces better behavior, to make sure that all references to the removed nsDef are repointed to the correct nsDef hanging off the ancestor element. I may play around with whether this means it's not possible to have any hanging references to it, in which case we may be able to free the nsDef up immediately. Need to think about this and try to break it.

@paddor
Copy link
Author

paddor commented Nov 4, 2018

Interesting to know that every new node is created without an associated document! (Not an issue though, as long as no memory is leaked.) So, in the example script, as soon as the new node is integrated into the document (reparented), the duplicate namespace definition is removed? Pretty cool. 👍

@flavorjones
Copy link
Member

@paddor Node are always created with an associated document; but they're created with an associated parent. Because there's no parent, there's no where to look for an existing matching namespace.

I explored freeing this nsDef but determined it's not safe, and added a test demonstrating why it's not safe in 38a28fe.

flavorjones added a commit that referenced this issue Nov 5, 2018
…ns-memory-leak

address #1810 builder namespace memory leak
@flavorjones
Copy link
Member

The fix in #1815 has been merged into master. Will be in the next release, hopefully in the next few days.

@flavorjones flavorjones added this to the next milestone Nov 8, 2018
@paddor
Copy link
Author

paddor commented Nov 13, 2018

Thanks! Looking forward to the next release.

@arohr
Copy link

arohr commented Nov 21, 2018

Any plans when the next release will be released?

@flavorjones
Copy link
Member

Release is coming together now. Ideally soon after libxml 2.9.9 drops next week. Watch this milestone for updates: https://github.com/sparklemotion/nokogiri/milestone/16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

No branches or pull requests

3 participants