Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New command: ref-explore #33

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

NobodysNightmare
Copy link

@NobodysNightmare NobodysNightmare commented Oct 4, 2022

Hello 👋

First of all: Thanks for heapy. It was already helpful a bunch of times when figuring out where a memory leak originated.

What's in this PR?

This pull request adds a new feature to heapy that allows to follow the references to a given object.

Use case: You are analyzing a memory bloating issue and found out that a lot of very old GC generations still contain hundreds of ActiveRecord object instances. This information alone does not yet tell you, why you have so many objects that have not been garbage collected. You just know what is retained, but not why.

If you find an address of one of these objects (note: finding an address is out of scope for this PR), you can now follow the references to it and see why it has not been collected.

Example

Taking the fixture included in this repo as an example. The new command can run non-interactively (as all heapy commands do so far):

$ bin/heapy ref-explore spec/fixtures/dumps/00-heap.dump 0x7fb47763feb0

## Reference chain
<OBJECT ActiveRecord::Attribute::FromDatabase 0x7FB47763FEB0> (allocated at activerecord-4.2.3/lib/active_record/attribute.rb:5)
<HASH 0x7FB474CA1A90> (allocated at lib/active_record/attribute_set/builder.rb:30)
<OBJECT ActiveRecord::LazyAttributeHash 0x7FB474CA1B30> (allocated at lib/active_record/attribute_set/builder.rb:16)
<OBJECT ActiveRecord::AttributeSet 0x7FB474CA1A68> (allocated at lib/active_record/attribute_set/builder.rb:17)
<OBJECT Repo 0x7FB474CA1A40> (allocated at activerecord-4.2.3/lib/active_record/core.rb:114)
<ARRAY 996 items 0x7FB474D790A8> (allocated at activerecord-4.2.3/lib/active_record/querying.rb:50)
<OBJECT Repo::ActiveRecord_Relation 0x7FB476A8BE98> (allocated at lib/active_record/relation/spawn_methods.rb:10)
<OBJECT PagesController 0x7FB476AB25C0> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<HASH 0x7FB4772EAE68> (allocated at rack-1.6.4/lib/rack/mock.rb:92)
<OBJECT ActionDispatch::Request 0x7FB476AB2480> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<OBJECT ActionDispatch::Response 0x7FB476AB2458> (allocated at lib/action_controller/metal/rack_delegation.rb:28)
<ROOT machine_context 0x2> (allocated at )

## All references to 0x7fb47763feb0
 * <HASH 0x7FB474CA1A90> (allocated at lib/active_record/attribute_set/builder.rb:30)

However, since the initial loading can take some time, there is also an interactive mode, where the dump is loaded once and the user can enter address after address:

$ bin/heapy ref-explore spec/fixtures/dumps/00-heap.dump
Enter address to inspect: 0x7fb47763feb0

## Reference chain
<OBJECT ActiveRecord::Attribute::FromDatabase 0x7FB47763FEB0> (allocated at activerecord-4.2.3/lib/active_record/attribute.rb:5)
<HASH 0x7FB474CA1A90> (allocated at lib/active_record/attribute_set/builder.rb:30)
<OBJECT ActiveRecord::LazyAttributeHash 0x7FB474CA1B30> (allocated at lib/active_record/attribute_set/builder.rb:16)
<OBJECT ActiveRecord::AttributeSet 0x7FB474CA1A68> (allocated at lib/active_record/attribute_set/builder.rb:17)
<OBJECT Repo 0x7FB474CA1A40> (allocated at activerecord-4.2.3/lib/active_record/core.rb:114)
<ARRAY 996 items 0x7FB474D790A8> (allocated at activerecord-4.2.3/lib/active_record/querying.rb:50)
<OBJECT Repo::ActiveRecord_Relation 0x7FB476A8BE98> (allocated at lib/active_record/relation/spawn_methods.rb:10)
<OBJECT PagesController 0x7FB476AB25C0> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<HASH 0x7FB4772EAE68> (allocated at rack-1.6.4/lib/rack/mock.rb:92)
<OBJECT ActionDispatch::Request 0x7FB476AB2480> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<OBJECT ActionDispatch::Response 0x7FB476AB2458> (allocated at lib/action_controller/metal/rack_delegation.rb:28)
<ROOT machine_context 0x2> (allocated at )

## All references to 0x7fb47763feb0
 * <HASH 0x7FB474CA1A90> (allocated at lib/active_record/attribute_set/builder.rb:30)

Enter address to inspect: 0x7FB476AB2480

## Reference chain
<OBJECT ActionDispatch::Request 0x7FB476AB2480> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<OBJECT ActionDispatch::Response 0x7FB476AB2458> (allocated at lib/action_controller/metal/rack_delegation.rb:28)
<ROOT machine_context 0x2> (allocated at )

## All references to 0x7FB476AB2480
 * <OBJECT ActionDispatch::Response 0x7FB476AB2458> (allocated at lib/action_controller/metal/rack_delegation.rb:28)
 * <OBJECT PagesController 0x7FB476AB25C0> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)

Enter address to inspect: 0x7FB476AB25C0

## Reference chain
<OBJECT PagesController 0x7FB476AB25C0> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<HASH 0x7FB4772EAE68> (allocated at rack-1.6.4/lib/rack/mock.rb:92)
<OBJECT ActionDispatch::Request 0x7FB476AB2480> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<OBJECT ActionDispatch::Response 0x7FB476AB2458> (allocated at lib/action_controller/metal/rack_delegation.rb:28)
<ROOT machine_context 0x2> (allocated at )

## All references to 0x7FB476AB25C0
 * <HASH 0x7FB4772EAE68> (allocated at rack-1.6.4/lib/rack/mock.rb:92)

Enter address to inspect: ^C

In the examples above we can see that the object in question (an AR attribute) was not yet collected, because the request is still running, which is still referenced by the response and a PagesController instance.

Background & considerations

I wrote this as a simple ruby script to be able to diagnose a memory bloat where I had no clue why objects were not garbage collected. Since it worked out and was helpful, I thought I could try to propose this as an upstream addition to heapy. Please let me know if you think this could be useful.

Things that will probably need further extension/talking about:

  • Loading a larger memory dump can take a significant amount of time (my own script contained a progress indicator: print "#{f.pos / (1024 * 1024)} MiB\r" if line_number % 1000 == 0)
  • Formatting (and level of detail) of stack lines is probably something that's worth having a second look at
  • Obtaining object addresses may be handled through heapy as well (so far I manually obtained them, e.g. like grep generation\":100 my.dump)

@oleg-vinted
Copy link

oleg-vinted commented May 23, 2023

This has been incredibly helpful for us in figuring out the same issue with ActiveRecord objects being retained, thank you! 🙇 I'd definitely like to see this upstreamed.

However, there have been some cases where heapy ref-explore doesn't find any path to root, and yet, the object is retained after multiple GC runs.

I added this and it showed me a bunch of IMEMO objects that gave me a path to the object I was inspecting.

    def add_class_references(o, addr)
      return unless o.key?('class')
      ref = o.fetch('class').to_i(16)
      (@reverse_references[ref] ||= []) << addr
    end

I'm not sure if this correctly models what CRuby GC does though.

@reverse_references = {}
@virtual_root_address = 0
File.open(filename) do |f|
f.each.with_index do |line, i|

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: trailing whitespace.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this and many other whitespace issues by saving the file once more in my current editor configuration :D

Comment on lines +140 to +152
elsif obj['type'] == 'ARRAY'
"#{obj['length']} items"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For IMEMO objects it would be useful to show obj['imemo_type'] here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trusting you here, I blindly added this. The heap dump included in this repo does not seem to contain IMEMOs. Feel free to double check that this is the desired result and maybe share an example how it is rendered :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to work, 'imemo_type' needs to be added here:

      simple_object = o.slice('type', 'file', 'name', 'class', 'length')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 I clearly did not work on this piece of code for some time :D Fixed.

loop do
print 'Enter address to inspect: '

drill_down($stdin.gets)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: a proper readline-based REPL would be nicer, as this doesn't have history, exits on Ctrl-C, crashes on Ctrl-D, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point and something I did not look into before. I will check this out and depending on complexity add it to this PR, to make this feature more "production ready".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm... I just realized how dead simple readline is :D Added it.

@NobodysNightmare
Copy link
Author

@oleg-vinted Thanks a lot for your feedback and review. I am happy that this was already helpful to someone.

@schneems if there is anything I can do to help with heapy, feel free to reach out. I like this project a lot.

@NobodysNightmare NobodysNightmare force-pushed the reference-explorer branch 3 times, most recently from 3064ed2 to 863ed08 Compare May 24, 2023 06:12
@oleg-vinted
Copy link

tl;dr I suggest we add this:

    def add_class_references(o, addr)
      return unless o.key?('class')
      return unless o['type'] != 'IMEMO'
      ref = o.fetch('class').to_i(16)
      (@reverse_references[ref] ||= []) << addr
    end

I dug into the Ruby source, and the GC does indeed mark every object's class, which makes sense - if an object is referenced from root (directly or indirectly), its class cannot be garbage collected.

(The exception is IMEMO types, and since not-so-long-ago, there's no class key for IMEMO in the heap dump.)

For some reason, however, the class doesn't end up in the references array, even though the idea of ObjectSpace.dump_all (as far as I can tell) is to piggyback off of GC and instead of actually marking referenced objects, it notes them instead and dumps them into references afterwards. The callback function (gc_mark_func_data_struct) is set here.

@oleg-vinted
Copy link

Here's a repro:

require 'objspace'
ObjectSpace.trace_object_allocations_start

class MyClass; end
my_instance = MyClass.new

Object.send(:remove_const, :MyClass)
3.times { GC.start }

File.open('heap.json', 'w') do |f|
  ObjectSpace.dump_all(output: f)
end

puts "Check this object's address using ref-explore:"
puts ObjectSpace.dump(my_instance.class)

Without add_class_references, the output is:

Could not find a reference chain leading to a root node. Searching for a non-specific chain now.

## Reference chain
<CLASS MyClass 0x102D3FD08> (allocated at repro.rb:4)

## All references to 0x102d3fd08
 * <CLASS 0x102D3FCE0> (allocated at repro.rb:4)

With the patch:

## Reference chain
<CLASS MyClass 0x102D3FD08> (allocated at repro.rb:4)
<OBJECT MyClass 0x102D3FC40> (allocated at repro.rb:5)
<IMEMO env 0x102D4D930> (allocated at )
<ROOT machine_context 0x1> (allocated at )

## All references to 0x102d3fd08
 * <OBJECT MyClass 0x102D3FC40> (allocated at repro.rb:5)
 * <CLASS 0x102D3FCE0> (allocated at repro.rb:4)

@NobodysNightmare
Copy link
Author

Thanks for validating the assumptions in Ruby source code and giving me a reproduction example. Integrated your proposal.

Allowing to perform analysis of references to an object.
@NobodysNightmare
Copy link
Author

I also used the heap dump generated by the example script that you provided to write some very basic specs. These test the basic functionality of the reference exploring, as well as the special case with the unreferenced class.

@oleg-vinted
Copy link

oleg-vinted commented Jun 5, 2023

Figured out the last missing part: the class reference is absent in the references array because of this return. I guess the idea was to keep heap dumps smaller and not repeat the same address twice (?). This, however, makes tools like these more complicated, I wish that return wasn't even there.

@schneems
Copy link
Member

schneems commented Jun 8, 2023

It seems like a helpful idea. Sorry for the silence.

The tests are failing. Can you take a look?

as an fyi Heap dumps have some odd behavior see:

Basically: just because a reference exists in the dump doesn’t mean it is still retained.

I think that a tool like yours is a good idea for when there is definitely a problem, but it might show false positives if a reference is not actually being retained but is not showing up in the dump.

Maybe we could add something into the docs to educate people so they know the limitations of the tool (or they don’t open issues saying “heapy is claiming this object is retained in ref explore but it’s not”).

# so we manually introduce the back-reference.
def add_class_references(o, addr)
return unless o.key?('class')
return if o['type'] == 'IMEMO'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this type filter is not completely correct either, however, as https://github.com/ruby/ruby/blob/d5ef373b1194bac64784ae316d125d7a2cf1988a/gc.c#L7026 can sometimes mark the class depending on the type of IMEMO.

This would still work just fine in newer version of Ruby because the class would appear in the references array if there was indeed a reference.

@oleg-vinted
Copy link

Basically: just because a reference exists in the dump doesn’t mean it is still retained.

Are you sure this is the case and not the other way around? 🤔 I.e. ref-explore doesn't find a reference chain to root, and yet the object is still retained. I've tried to alleviate the problem with #33 (comment), however, it's still not completely right.

@NobodysNightmare
Copy link
Author

NobodysNightmare commented Jun 8, 2023

The tests are failing. Can you take a look?

Looks like the newly added spec was incompatible with older Rubies. do ... ensure is a feature introduced in Ruby 2.5 apparently. Fixed by adding an additional ``begin ... end`.

Basically: just because a reference exists in the dump doesn’t mean it is still retained.

Are you sure this is the case

We should be able to find that out using the heap_problem repo that @schneems linked. I'll be able to take more looks into this PR (including a rebase) next week. Though tests should pass already.

@oleg-vinted
Copy link

heap_problem repo can't reproduce the issue for me with Ruby 3.1.3. Had to replace $address = "0x#{ (string.object_id << 1).to_s(16) }" with $address = JSON.parse(ObjectSpace.dump(string))["address"] to get the address though.

The slice method apparently was added to Hashes later in
the lifetime of Ruby. Versions 2.3 and 2.4 do not recognize this
method.

do ... ensure ... end blocks were also added in 2.5, previously requiring
a block started with begin.
@NobodysNightmare
Copy link
Author

Also replaced slice with select {}, because the older Rubies do not support that method on a Hash. Ran tests locally on Ruby 2.4.10 to confirm Ruby 2.4 compatibility.


return unless addr

simple_object = o.select { |k, _v| %w[type file name class length imemo_type].include?(k) }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I believe %w[type file name class length imemo_type] gets allocated at every iteration, consider extracting this into a constant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note: # frozen_string_literal: true would prevent some more string allocations. Though I'm not sure what the coding style of this gem dictates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively: @schneems how would you feel about dropping support for EOL rubies from heapy?

All CI failures so far were caused by tests for the older versions and this nit is the consequence of me dodging to use slice.

Also considering that heapy is not typically a dependency of a production application, but rather a tool running on a developer's machine I believe that it can allow itself to ask for recent Ruby versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants