Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ruby] Increase in memory footprint in 3.15 #8337

Closed
robertlaurin opened this issue Feb 23, 2021 · 32 comments
Closed

[Ruby] Increase in memory footprint in 3.15 #8337

robertlaurin opened this issue Feb 23, 2021 · 32 comments
Assignees

Comments

@robertlaurin
Copy link

What version of protobuf and what language are you using?
Version: v3.15.1
Language: Ruby

What operating system (Linux, Windows, ...) and version?
macOS Big Sur Version 11.2.1

What runtime / compiler are you using (e.g., python version or gcc version)
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin19]

What did you do?

Used a minimal script with opentelemetry-ruby to recreate what was observed in production, this script does require a collector running to demonstrate the issue.

require 'bundler/inline'

gemfile(true) do
  source 'https://rubygems.org'

  gem 'google-protobuf', ENV['GOOGLE_PROTOBUF_VERSION']
  gem 'heap-profiler'
  gem 'opentelemetry-api'
  gem 'opentelemetry-sdk'
  gem 'opentelemetry-exporter-otlp'
end

require 'heap-profiler'
require 'opentelemetry-api'
require 'opentelemetry-sdk'
require 'opentelemetry-exporter-otlp'

HeapProfiler.report("heap_output-protobuf-v-#{Gem.loaded_specs['google-protobuf'].version}") do
  span_processor = OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
    OpenTelemetry::Exporter::OTLP::Exporter.new(
      endpoint: 'http://localhost:55681/v1/trace'
    ),
  )

  OpenTelemetry::SDK.configure do |c|
    c.service_name = 'protobuf-mem-usage'
    c.add_span_processor(span_processor)
  end

  TestTracer = OpenTelemetry.tracer_provider.tracer('protobuf-mem-usage')

  1..10_000.times do
    TestTracer.in_span('parent') do
      1..100.times { TestTracer.in_span('child') {} }
    end
  end
end

What did you expect to see
Using google-protobuf 3.14

heap-profiler heap_output-protobuf-v-3.14.0/
Total allocated: 2.32 GB (22915749 objects)
Total retained: 2.54 MB (17341 objects)
retained memory by class
-----------------------------------
   1.05 MB  Thread
 399.55 kB  OpenTelemetry::SDK::Trace::Span
 358.10 kB  Time
 203.60 kB  String
 166.56 kB  OpenTelemetry::Trace::SpanContext
 150.12 kB  Thread::Mutex
  94.21 kB  OpenTelemetry::SDK::Trace::SpanData
  87.26 kB  Hash
  22.44 kB  Array
   2.08 kB  <ment> (IMEMO)
  344.00 B  Net::HTTP
  280.00 B  Google::Protobuf::RepeatedField
  232.00 B  File
  160.00 B  Proc
  144.00 B  OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor
  128.00 B  Enumerator
  120.00 B  URI::HTTP
  120.00 B  OpenTelemetry::SDK::Trace::TracerProvider
  120.00 B  OpenTelemetry::Exporter::OTLP::Exporter
  120.00 B  Net::HTTP::Post
  120.00 B  Logger::LogDevice
   88.00 B  Net::BufferedIO
   80.00 B  OpenTelemetry::SDK::Trace::Tracer
   80.00 B  OpenTelemetry::SDK::Resources::Resource
   80.00 B  Logger
   80.00 B  <env> (IMEMO)
   56.00 B  Thread::ConditionVariable
   40.00 B  OpenTelemetry::Trace::Span
   40.00 B  OpenTelemetry::SDK::Trace::TracerProvider::Key
   40.00 B  OpenTelemetry::SDK::Trace::NoopSpanProcessor
   40.00 B  OpenTelemetry::SDK::InstrumentationLibrary
   40.00 B  OpenTelemetry::SDK::Baggage::Manager
   40.00 B  OpenTelemetry::Context::Propagation::CompositePropagator
   40.00 B  OpenTelemetry::Context
   40.00 B  Logger::Formatter
   40.00 B  <svar> (IMEMO)

What did you see instead?
Using google-protobuf 3.15.1

heap-profiler heap_output-protobuf-v-3.15.1/
Total allocated: 2.30 GB (22584230 objects)
Total retained: 19.02 MB (356022 objects)
retained memory by class
-----------------------------------
   6.66 MB  Google::Protobuf::Internal::Arena
   4.56 MB  Opentelemetry::Proto::Trace::V1::Status
   4.56 MB  Opentelemetry::Proto::Trace::V1::Span
   1.05 MB  Thread
 396.78 kB  Hash
 393.22 kB  OpenTelemetry::SDK::Trace::Span
 352.43 kB  Time
 219.30 kB  String
 163.92 kB  OpenTelemetry::Trace::SpanContext
 147.74 kB  Thread::Mutex
 110.44 kB  Rational
  94.21 kB  OpenTelemetry::SDK::Trace::SpanData
  80.14 kB  Opentelemetry::Proto::Common::V1::KeyValue
  80.14 kB  Opentelemetry::Proto::Common::V1::AnyValue
  55.74 kB  Array
  36.88 kB  <ifunc> (IMEMO)
  18.40 kB  <memo> (IMEMO)
   8.90 kB  Opentelemetry::Proto::Trace::V1::ResourceSpans
   8.90 kB  Opentelemetry::Proto::Trace::V1::InstrumentationLibrarySpans
   8.90 kB  Opentelemetry::Proto::Resource::V1::Resource
   8.90 kB  Opentelemetry::Proto::Common::V1::InstrumentationLibrary
   8.90 kB  Opentelemetry::Proto::Collector::Trace::V1::ExportTraceServiceRequest
   2.08 kB  <ment> (IMEMO)
  344.00 B  Net::HTTP
  232.00 B  File
  160.00 B  Proc
  144.00 B  OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor
  128.00 B  Enumerator
  120.00 B  URI::HTTP
  120.00 B  OpenTelemetry::SDK::Trace::TracerProvider
  120.00 B  OpenTelemetry::Exporter::OTLP::Exporter
  120.00 B  Net::HTTP::Post
  120.00 B  Logger::LogDevice
   88.00 B  Net::BufferedIO
   88.00 B  <select_set> (DATA)
   80.00 B  OpenTelemetry::SDK::Trace::Tracer
   80.00 B  OpenTelemetry::SDK::Resources::Resource
   80.00 B  Logger
   80.00 B  <throw_data> (IMEMO)
   80.00 B  <env> (IMEMO)
   56.00 B  Thread::ConditionVariable
   40.00 B  OpenTelemetry::Trace::Span
   40.00 B  OpenTelemetry::SDK::Trace::TracerProvider::Key
   40.00 B  OpenTelemetry::SDK::Trace::NoopSpanProcessor
   40.00 B  OpenTelemetry::SDK::InstrumentationLibrary
   40.00 B  OpenTelemetry::SDK::Baggage::Manager
   40.00 B  OpenTelemetry::Context::Propagation::CompositePropagator
   40.00 B  OpenTelemetry::Context
   40.00 B  Logger::Formatter

Anything else we should know about your project / environment
I have had multiple reports of application owners having to revert the google-protobuf gem from 3.15 to 3.14 after seeing their applications double their memory usage, and hitting their limits.

@haberman
Copy link
Member

Thanks for the report and the self-contained repro.

This is a result of #8184, which should significantly increase performance, but also changes the memory management model.

I want to do more investigation to get a more detailed root cause. But for now I'll share an overview of the new memory management model, in case it gives you the info you need to fix the problem on your end.

  • The old model was: every message is its own Ruby object, which is collected whenever Ruby GCs it.
  • The new model is: messages are allocated in C-based arenas, and none of the messages in an arena are freed until the entire arena is GC'd. Any reference to any message in the arena will keep the whole arena alive.

To get a root cause for your issue, we should figure out why the arenas are growing so big. Messages end up in the same arena in a couple of ways:

  • If you parse a payload, all of the parsed messages are in one big arena. Keeping even a single reference to any message in the tree will keep the whole tree alive.
  • Whenever you assign msg.foo = msg2, then msg and msg2's arenas are joined, and none of the messages in either arena will be freed until all messages in both arenas are garbage.

You can't "break up" an existing arena, but you can use Google::Protobuf.deep_copy to copy messages out of an arena to a new, fresh arena that will contain only that message and its children. (Note: dup and clone won't work for this: since those are shallow copies, they contain references to the other arena still, so they must keep it alive).

I am hoping that a few well-placed Google::Protobuf.deep_copy calls will fix the problem you are experiencing.

@haberman
Copy link
Member

For your repro, is it possible to reduce it further so that it is just protobuf operations?

@haberman
Copy link
Member

Scratch everything I said before. It is all true, but I just noticed a detail that I think explains all of this better.

I see that you're using Ruby 2.5. Due to a shortcoming in the Ruby WeakMap class that was fixed in 2.7, protobuf 3.15 will retain more memory on pre-2.7 versions of Ruby.

Is it possible for you to upgrade Ruby to 2.7? I believe this will fix the problem you are seeing.

@haberman
Copy link
Member

I will also investigate if there is a better way of allowing this collection to happen for Ruby <2.7.

@robertlaurin
Copy link
Author

Thanks for the quick response and providing the above explanations. That is welcomed context to have about the changes in behaviour.

So regarding the ruby version, I did the same profiling using ruby 2.7.2 with much better results.

heap-profiler ruby-2.7.2-heap_output-protobuf-v-3.14.0/
Total allocated: 1.64 GB (19128990 objects)
Total retained: 2.54 MB (17218 objects)
retained memory by class
-----------------------------------
   1.05 MB  Thread
 393.22 kB  OpenTelemetry::SDK::Trace::Span
 352.34 kB  Time
 221.19 kB  String
 163.92 kB  OpenTelemetry::Trace::SpanContext
 147.74 kB  Thread::Mutex
  94.21 kB  OpenTelemetry::SDK::Trace::SpanData
  86.34 kB  Hash
  23.40 kB  Array
   2.54 kB  <ment> (IMEMO)
  672.00 B  MatchData
  400.00 B  <mapping_buffer> (DATA)
  360.00 B  Net::HTTP
  280.00 B  Google::Protobuf::RepeatedField
  280.00 B  <svar> (IMEMO)
  232.00 B  File
  176.00 B  <env> (IMEMO)
  160.00 B  Proc
  144.00 B  OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor
  136.00 B  Enumerator
  120.00 B  URI::HTTP
  120.00 B  OpenTelemetry::SDK::Trace::TracerProvider
  120.00 B  OpenTelemetry::Exporter::OTLP::Exporter
  120.00 B  Net::HTTP::Post
  120.00 B  Logger::LogDevice
   96.00 B  Net::BufferedIO
   80.00 B  OpenTelemetry::SDK::Trace::Tracer
   80.00 B  OpenTelemetry::SDK::Resources::Resource
   80.00 B  Logger
   80.00 B  <throw_data> (IMEMO)
   80.00 B  <memo> (IMEMO)
   80.00 B  <ifunc> (IMEMO)
   64.00 B  Thread::ConditionVariable
   64.00 B  Monitor
   40.00 B  OpenTelemetry::Trace::Span
   40.00 B  OpenTelemetry::SDK::Trace::TracerProvider::Key
   40.00 B  OpenTelemetry::SDK::Trace::NoopSpanProcessor
   40.00 B  OpenTelemetry::SDK::InstrumentationLibrary
   40.00 B  OpenTelemetry::SDK::Baggage::Manager
   40.00 B  OpenTelemetry::Context::Propagation::CompositePropagator
   40.00 B  OpenTelemetry::Context
   40.00 B  Logger::Formatter

And with 3.15

heap-profiler ruby-2.7.2-heap_output-protobuf-v-3.15.0/
Total allocated: 1.63 GB (19114662 objects)
Total retained: 2.54 MB (17212 objects)
retained memory by class
-----------------------------------
   1.05 MB  Thread
 393.22 kB  OpenTelemetry::SDK::Trace::Span
 352.43 kB  Time
 221.18 kB  String
 163.92 kB  OpenTelemetry::Trace::SpanContext
 147.74 kB  Thread::Mutex
  94.21 kB  OpenTelemetry::SDK::Trace::SpanData
  86.34 kB  Hash
  23.40 kB  Array
   2.54 kB  <ment> (IMEMO)
  672.00 B  MatchData
  400.00 B  <mapping_buffer> (DATA)
  360.00 B  Net::HTTP
  280.00 B  <svar> (IMEMO)
  232.00 B  File
  176.00 B  <env> (IMEMO)
  160.00 B  Proc
  144.00 B  OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor
  136.00 B  Enumerator
  120.00 B  URI::HTTP
  120.00 B  OpenTelemetry::SDK::Trace::TracerProvider
  120.00 B  OpenTelemetry::Exporter::OTLP::Exporter
  120.00 B  Net::HTTP::Post
  120.00 B  Logger::LogDevice
   96.00 B  Net::BufferedIO
   80.00 B  OpenTelemetry::SDK::Trace::Tracer
   80.00 B  OpenTelemetry::SDK::Resources::Resource
   80.00 B  Logger
   80.00 B  <throw_data> (IMEMO)
   80.00 B  <memo> (IMEMO)
   80.00 B  <ifunc> (IMEMO)
   64.00 B  Thread::ConditionVariable
   64.00 B  Monitor
   40.00 B  OpenTelemetry::Trace::Span
   40.00 B  OpenTelemetry::SDK::Trace::TracerProvider::Key
   40.00 B  OpenTelemetry::SDK::Trace::NoopSpanProcessor
   40.00 B  OpenTelemetry::SDK::InstrumentationLibrary
   40.00 B  OpenTelemetry::SDK::Baggage::Manager
   40.00 B  OpenTelemetry::Context::Propagation::CompositePropagator
   40.00 B  OpenTelemetry::Context
   40.00 B  Logger::Formatter

Unfortunately (for me) this is still a bit of a problem for applications on ruby <2.7

@byroot
Copy link
Contributor

byroot commented Feb 24, 2021

Pre-2.7 you can implement a sketchy WeakMap using https://ruby-doc.org/core-2.5.3/ObjectSpace.html#method-c-_id2ref

@haberman
Copy link
Member

@byroot thanks for the lead. What is the behavior of #_id2ref if the object with that ID has been collected? Is it guaranteed to fail, or does the caller have to protect against this by promising never to pass an ID of a collected object?

@byroot
Copy link
Contributor

byroot commented Feb 24, 2021

>> id = Object.new.object_id
=> 70330182748900
>> 4.times { GC.start }
=> 4
>> ObjectSpace._id2ref(id)
RangeError (0x00003ff7058f4ae4 is recycled object)

However I suspect the address might be re-used at some point, so it might weird to weird stuff. I'll have to consult with some people who knows the GC better than me.

Another possibility is to use a mutable proxy object as identifier, as to comply with pre 2.7 WeakMap, but I'll have to dig in the source a bit more, I've only followed the discussion, so not sure what it would be possible to do.

@byroot
Copy link
Contributor

byroot commented Feb 24, 2021

One other thing to consider IMHO, is to just not do Arenas in pre 2.7.

@byroot
Copy link
Contributor

byroot commented Feb 24, 2021

Ah I finally found the code I saw doing this sketchy object_id based weakref: https://github.com/ruby-concurrency/ref/blob/7a3991fea598edfe41d4e3a9e37b344bc7b7570d/lib/ref/weak_reference/pure_ruby.rb

@haberman
Copy link
Member

However I suspect the address might be re-used at some point

Yes I was worried about that.

One other thing to consider IMHO, is to just not do Arenas in pre 2.7.

Having two different memory management models co-existing in the code would dramatically complicate the C extension.

PR #8184 (which changed from the old model to the new model) had a line count of +14,071 −23,328. If we left the old memory management model in, it would be more like +14,071 −0.

Ah I finally found the code I saw doing this sketchy object_id based weakref

Ah looks like it is depending on finalizers to remove collected objects from the weak map?

@haberman
Copy link
Member

I think I'll try making the cache a regular Hash and relying on the finalizer of each Message object remove the cache entry.

@byroot
Copy link
Contributor

byroot commented Feb 24, 2021

Having two different memory management models co-existing in the code would dramatically complicate the C extension.

That's understandable. Again I've only glanced at the code so I'm a bit clueless at what it does, so until I get the chance to dig into it don't take my suggestions as more than stabs in the dark 😉

Ah looks like it is depending on finalizers to remove collected objects from the weak map?

Yes, just like Ruby's WeakMap, it use the finalizer to remove the Hash entry. But note that this code was written for the MRI 1.8 / 1.9 days. And I just got the confirmation that object_id can be re-used (they're pretty much a pointer to the object slot), so it's not really a reliable method on 2.x.

So the only solution I can see is to find a light mutable object to serve as WeakMap key.

@haberman
Copy link
Member

So the only solution I can see is to find a light mutable object to serve as WeakMap key.

How would that work? My WeakMap is { void* -> VALUE }. How would I look up the light mutable object for the C void*?

@haberman
Copy link
Member

I guess I could have a secondary non-weak Hash mapping { void* -> light mutable object }, and accept that the objects of the secondary map will never be collected.

So the two options I see are:

  1. Use WeakMap for the cache with a secondary non-weak Hash. Downsides: secondary map grows without bound.
  2. Use a regular Hash as the cache and change my finalizers to explicitly remove cache entries when they are destroyed. Downsides: intrusive to more parts of the code, more edge cases around finalization ordering.

Maybe (1) is our best option here.

@byroot
Copy link
Contributor

byroot commented Feb 24, 2021

I guess I could have a secondary non-weak Hash mapping { void* -> light mutable object },

That's what I was suggesting yes.

accept that the objects of the secondary map will never be collected.

You can purge it from time to time by checking if the light mutable object is still in use in the WeakMap. It's far from ideal of course, as it's linear performance (I realize I have no idea what size we're talking about).

@byroot
Copy link
Contributor

byroot commented Feb 24, 2021

And I just got the confirmation that object_id can be re-used (they're pretty much a pointer to the object slot), so it's not really a reliable method on 2.x.

Scratch that part, I missed that this WeakMap implementation had a backreference check to avoid false-positives. So it could actually be an option to use it as a fallback for ruby < 2.7.

@haberman
Copy link
Member

Sounds like a pretty good approach, thanks for the ideas! I'll work on banging this out (my option (1) from above).

It's far from ideal of course, as it's linear performance (I realize I have no idea what size we're talking about).

It's one entry for every live protobuf message object in the process. But only for messages that users have accessed directly from Ruby (the Ruby wrappers are created lazily).

For example, if you parse a protobuf binary payload that has 1,000 sub-messages, only 1 Ruby object is created for the top-level message, and only 1 object is in the cache.

If the user then iterates over all 1,000 sub-message objects, now there are 1,000 Ruby objects and 1,000 objects in the cache.

@haberman
Copy link
Member

I missed that this WeakMap implementation had a backreference check to avoid false-positives.

Does this make it immune to object_id reuse somehow?

@haberman
Copy link
Member

It works! #8341

Leaks should be fixed and performance is better too.

@casperisfine
Copy link
Contributor

performance is better too.

I suspect this frees lots of GC time.

@haberman
Copy link
Member

The fix is released in 3.15.3. @robertlaurin please let me know if you still see the problem after upgrading.

@hatstand
Copy link

hatstand commented Mar 2, 2021

@haberman We're seeing some reproducible OOMs still in 3.15.3. I'll try to create a simple repro but the use case that triggers it right now is assigning a large number (~120k) of messages to a repeated field in a single parent message.

@haberman
Copy link
Member

haberman commented Mar 2, 2021

@hatstand Are you overwriting the same element of the repeated field over and over, and expecting 120k-1 messages to be collected? Or is your repeated field 120k messages long?

@hatstand
Copy link

hatstand commented Mar 2, 2021

@hatstand Are you overwriting the same element of the repeated field over and over, and expecting 120k-1 messages to be collected? Or is your repeated field 120k messages long?

Our repeated field has 120k messages. Rough pseudocode:

Outer.new(
  inners: fetch_lots_of_things.map { |t| Inner.new(foo: t.field) }
)

@haberman
Copy link
Member

haberman commented Mar 2, 2021

I see, so this isn't a GC issue, as everything is live and nothing is collectible.

You are observing that this code consumes noticeably more memory than in 3.14? How much more?

@hatstand
Copy link

hatstand commented Mar 2, 2021

I see, so this isn't a GC issue, as everything is live and nothing is collectible.

You are observing that this code consumes noticeably more memory than in 3.14? How much more?

The old code would max out at about 500MiB whereas with 3.15.3 it's OOMing at 6GiB.

@haberman
Copy link
Member

haberman commented Mar 2, 2021

Ok a >10x memory increase for this pattern definitely seems like a problem. I'll await your repro.

@hatstand
Copy link

hatstand commented Mar 2, 2021

This seems to be sufficient:

require 'google/protobuf'

Google::Protobuf::DescriptorPool.generated_pool.build do
  add_file("inner.proto", :syntax => :proto3) do
    add_message "Inner" do
      optional :foo, :string, 1
      optional :bar, :string, 2
    end
  end
end

Inner = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("Inner").msgclass

Google::Protobuf::DescriptorPool.generated_pool.build do
  add_file("outer.proto", :syntax => :proto3) do
    add_message "Outer" do
      repeated :inners, :message, 1, "Inner"
    end
  end
end

Outer = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("Outer").msgclass

outer_proto = Outer.new(
  inners: (1..120_000).map { |i| ::Inner.new(foo: i.to_s, bar: i.to_s) }
)

@hatstand
Copy link

hatstand commented Mar 2, 2021

I should have mentioned that this is ruby 2.7.1 on Linux too.

@haberman
Copy link
Member

haberman commented Mar 2, 2021

Thanks @hatstand, I have a fix for this here: #8364

There was a bug that forced a realloc on every array append, resulting in quadratic memory usage.

@hatstand
Copy link

hatstand commented Mar 3, 2021

Thanks @hatstand, I have a fix for this here: #8364

There was a bug that forced a realloc on every array append, resulting in quadratic memory usage.

Awesome, thanks!

arielvalentin added a commit to arielvalentin/opentelemetry-ruby that referenced this issue Jul 27, 2023
GitHub still uses legacy versions of the protobuf gem. This makes it challenging to
upgrade to newer versions of the OTLP Exporter which only supports versions `~> 3.19`.

This change loosens the restrictions to allow GitHub to adopt newer versions of the protobuf
definitions using an older verson of the library.

This change explicitly skips 3.15 due to bugs like protocolbuffers/protobuf#8337
arielvalentin added a commit to arielvalentin/opentelemetry-ruby that referenced this issue Jul 27, 2023
GitHub still uses legacy versions of the protobuf gem. This makes it challenging to
upgrade to newer versions of the OTLP Exporter which only supports versions `~> 3.19`.

This change loosens the restrictions to allow GitHub to adopt newer versions of the protobuf
definitions using an older verson of the library.

This change explicitly skips 3.15 due to bugs like protocolbuffers/protobuf#8337
robertlaurin pushed a commit to open-telemetry/opentelemetry-ruby that referenced this issue Jul 27, 2023
* chore(deps): Allow google-protobuf `~> 3.14`

GitHub still uses legacy versions of the protobuf gem. This makes it challenging to
upgrade to newer versions of the OTLP Exporter which only supports versions `~> 3.19`.

This change loosens the restrictions to allow GitHub to adopt newer versions of the protobuf
definitions using an older verson of the library.

This change explicitly skips 3.15 due to bugs like protocolbuffers/protobuf#8337

* squash: PR feedback from @simi

Addresses #1500 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants