Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6.14.0 creates infinite recursion on "net request" with rack-mini-profiler present #510

Closed
PikachuEXE opened this issue Nov 27, 2020 · 20 comments · Fixed by #511
Closed

Comments

@PikachuEXE
Copy link

^^ Provide a general summary of the issue in the title above. ^^

Description

We use fastimage to check the type of image behind the image URLs input by user before saving them.
Example code would be something like:

FastImage.type(
  "https://duckduckgo.com/assets/logo_homepage.alt.v108.svg",
  timeout: 5,
).nil?

Expected Behavior

It should work without infinite recursion (SystemStackError: stack level too deep)

Troubleshooting or NR Diag results

Error Backtrace
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:305:in `hash'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:305:in `lookup_unicode_composition'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:215:in `unicode_compose_pair'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:153:in `block in unicode_compose'
/usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.3.4/lib/active_support/core_ext/range/each.rb:9:in `each'
/usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.3.4/lib/active_support/core_ext/range/each.rb:9:in `each'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:149:in `unicode_compose'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:121:in `unicode_normalize_kc'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/uri.rb:566:in `normalize_component'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/uri.rb:1538:in `block in normalized_path'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/uri.rb:1537:in `map'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/uri.rb:1537:in `normalized_path'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/uri.rb:2171:in `normalize'
/usr/local/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/uri.rb:2184:in `normalize!'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/http_clients/uri_util.rb:35:in `parse_and_normalize_url'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/http_clients/net_http_wrappers.rb:65:in `uri'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/instrumentation/net_prepend.rb:45:in `request'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/patches/net_patches.rb:19:in `block in request_with_mini_profiler'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/mini_profiler/profiling_methods.rb:46:in `step'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/patches/net_patches.rb:18:in `request_with_mini_profiler'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/instrumentation/net_prepend.rb:57:in `block (2 levels) in request'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/tracer.rb:371:in `capture_segment_error'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/instrumentation/net_prepend.rb:56:in `block in request'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent.rb:506:in `disable_all_tracing'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/instrumentation/net_prepend.rb:55:in `request'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/patches/net_patches.rb:19:in `block in request_with_mini_profiler'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/mini_profiler/profiling_methods.rb:46:in `step'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/patches/net_patches.rb:18:in `request_with_mini_profiler'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/instrumentation/net_prepend.rb:57:in `block (2 levels) in request'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/tracer.rb:371:in `capture_segment_error'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/instrumentation/net_prepend.rb:56:in `block in request'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent.rb:506:in `disable_all_tracing'
/usr/local/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/instrumentation/net_prepend.rb:55:in `request'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/patches/net_patches.rb:19:in `block in request_with_mini_profiler'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/mini_profiler/profiling_methods.rb:46:in `step'
/usr/local/lib/ruby/gems/2.7.0/gems/rack-mini-profiler-2.2.0/lib/patches/net_patches.rb:18:in `request_with_mini_profiler'

Steps to Reproduce

begin
  parsed_uri = URI.parse("https://duckduckgo.com/assets/logo_homepage.alt.v108.svg")
  http = Net::HTTP.new(parsed_uri.host, parsed_uri.port)
  response = http.request_get(parsed_uri.request_uri, {'Accept-Encoding' => 'identity'})
  response.code
rescue SystemStackError => ex
  puts ex.backtrace
  raise
end

Your Environment

Include as many relevant details about your environment as possible including the running version of the Ruby agent, the Ruby version being used and any other relevant environment information

  • Docker with Ubuntu 18.04 LTS, same for Docker host
  • Ruby 2.7.2, Rails 6.0.3.4
  • rack-mini-profiler (2.2.0)
  • newrelic_rpm (6.14.0)

Additional context

newrelic_rpm (6.13.1) has no such issue
This can only be reproduced on production like environment
(This issue first affects production env & later reproduced on staging)

@ro-savage
Copy link

ro-savage commented Nov 27, 2020

We have a similar issue. I haven't found the exact cause however it took down our sign up page.

I believe we had some error that threw a "Reason stack level too deep" and this caused NewRelic to throw an uncaught error.

We did not have an issue with 6.13.1

We also we unable to reproduce this on a non-production environment

@dannyvanhoof
Copy link

We have the same in development:

Rails 6.0.3.4
ruby 2.7.0p0

Traceback (most recent call last): 9360: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/threading/agent_thread.rb:14:in block in create'
9359: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/agent.rb:725:in block in start_worker_thread' 9358: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/agent.rb:698:in deferred_work!'
9357: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/agent.rb:679:in catch_errors' 9356: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/agent.rb:699:in block in deferred_work!'
9355: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent.rb:506:in disable_all_tracing' 9354: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/agent.rb:700:in block (2 levels) in deferred_work!'
9353: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/newrelic_rpm-6.14.0/lib/new_relic/agent/agent.rb:945:in connect' ... 9348 levels... 4: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:121:in unicode_normalize_kc'
3: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:149:in unicode_compose' 2: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/activesupport-6.0.3.4/lib/active_support/core_ext/range/each.rb:9:in each'
1: from /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/activesupport-6.0.3.4/lib/active_support/core_ext/range/each.rb:9:in each' /home/danny/.rbenv/versions/2.7.0/lib/ruby/gems/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:153:in block in unicode_compose': stack level too deep (SystemStackError)
`

@tiagotex
Copy link

I had the same problem, the commit that introduced the regression was: cde7d10

Setting prepend_net_instrumentation: false on config/newrelic.yml fixes the problem.

@PikachuEXE
Copy link
Author

This should be fixed by making the option prepend_net_instrumentation false by default
And have a warning in doc for setting it to true

@jeremybdk
Copy link

We also have the same issue, we're not using rack-mini-profiler so might be another gem or depency.
@PikachuEXE by default newrelic is not enabled in development mode, that's probably why you don't see this happen. We've had the same issue, we saw the first crash in our staging servers.

@angelatan2
Copy link
Contributor

@ro-savage Thank you for bringing this issue to our attention. Could you share your environment, please? Also would adding the option prepend_net_instrumentation: false, work for you?

@jeremybdk
Copy link

jeremybdk commented Nov 30, 2020 via email

@ro-savage
Copy link

ro-savage commented Dec 1, 2020

@angelatan2 - We just removed New Relic completely from our system. The fact that feature version bump causes such a major issue for us means we decided New Relic wasn't worth the potentially damage. So I won't be able share any more details other than below.

But basically we had an impossible to find where it occurred error, that ended up with

addressable-2.7.0/lib/addressable/idna/pure.rb:305:in `hash': stack level too deep (SystemStackError)
<190>1 2020-11-26T22:10:06.81121+00:00 app worker.1 - - /app/vendor/bundle/ruby/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:305:in `hash': stack level too deep (SystemStackError)
	from /app/vendor/bundle/ruby/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:305:in `lookup_unicode_composition'
	from /app/vendor/bundle/ruby/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:215:in `unicode_compose_pair'
	from /app/vendor/bundle/ruby/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:153:in `block in unicode_compose'
	from /app/vendor/bundle/ruby/2.7.0/gems/activesupport-6.0.3.4/lib/active_support/core_ext/range/each.rb:9:in `each'
	from /app/vendor/bundle/ruby/2.7.0/gems/activesupport-6.0.3.4/lib/active_support/core_ext/range/each.rb:9:in `each'
	from /app/vendor/bundle/ruby/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:149:in `unicode_compose'
	from /app/vendor/bundle/ruby/2.7.0/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:121:in `unicode_normalize_kc'
	from /app/vendor/bundle/ruby/2.7.0/gems/addressable-2.7.0/lib/addressable/uri.rb:566:in `normalize_component

We are just a standard RoR app, and there was no reference to new relic. We just had to revert gems until we pinpointed the new relic agent.

We think it may have happened when we were sending off analytics and subscription data to third party services. As it seemed to have happened after our sign up form submission, but before the account was finished being created.

@mwlang
Copy link
Contributor

mwlang commented Dec 1, 2020

@ro-savage, it is a delicate balance between maintaining the old and moving forward with the evolving changes of the Ruby language and environment. We are well aware of just how sensitive such changes can be and work hard to test and vet each release to ensure exactly this experience does not happen. We obviously have a blindspot in our test suite that failed to surface this issue. For that, I sincerely apologize. We will certainly retrospect to understand just how this issue got past our tests and work to plug similar blindspots across all the instrumentation libraries we're supporting.

Meanwhile, I am looking into this and trying to understand just how our change caused this issue with the addressable gem. For background info, we are working to reduce corner cases where the Ruby agent's instrumenting approach of using method chaining leads to other gems not being usable in conjunction with the agent running as described in this blog post. Our long-term strategy calls for eliminating method-chaining as a means of patching into libraries and using prepend instead. Doing so will also help us eliminate issues with method parameters delegation that we're facing with Ruby 2.7+ as described here #502.

It looks like unfortunately in this case, our attempt to get "out of the way" led to the very thing we're trying to prevent in the first place. At the moment, this looks to be strictly confined to Ruby 2.7 and Addressable 2.7.0. if anyone is seeing outside Ruby 2.7, I would love to hear from you and especially see a stack trace.

Meanwhile, @PikachuEXE's suggested configuration change is the fastest way to revert to using the method chaining approach that doesn't exhibit this problem.

@jeremybdk
Copy link

@mwlang we're having the issue with ruby 2.6.6 here is the stacktrace:

➜ rails s
=> Booting Puma
=> Rails 5.2.3 application starting in development
=> Run `rails server -h` for more startup options
Exiting
Traceback (most recent call last):
	10377: from bin/rails:13:in `<main>'
	10376: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/activesupport-5.2.3/lib/active_support/dependencies.rb:291:in `require'
	10375: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/activesupport-5.2.3/lib/active_support/dependencies.rb:257:in `load_dependency'
	10374: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/activesupport-5.2.3/lib/active_support/dependencies.rb:291:in `block in require'
	10373: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/bootsnap-1.3.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:29:in `require'
	10372: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/bootsnap-1.3.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:20:in `require_with_bootsnap_lfi'
	10371: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/bootsnap-1.3.0/lib/bootsnap/load_path_cache/loaded_features_index.rb:65:in `register'
	10370: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/bootsnap-1.3.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:21:in `block in require_with_bootsnap_lfi'
	 ... 10365 levels...
	    4: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/activesupport-5.2.3/lib/active_support/core_ext/range/each.rb:9:in `each'
	    3: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/activesupport-5.2.3/lib/active_support/core_ext/range/each.rb:9:in `each'
	    2: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:153:in `block in unicode_compose'
	    1: from /Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:215:in `unicode_compose_pair'
/Users/jeremy/.rvm/gems/ruby-2.6.6@appaloosa/gems/addressable-2.7.0/lib/addressable/idna/pure.rb:305:in `lookup_unicode_composition': stack level too deep (SystemStackError)

Please don't hesitate to let me know if you need more informations.

@jeremybdk
Copy link

It's definitely an issue with 'net/http' and/or adressable.
As you can see in my previous stack trace, the app is crashing as soon as Puma try to start.

We have an initializer in our app that get a certificate from a webserver using Net::HTTP.start and a get request.
As soon I comment all this initializer the app will start.

Let me know if you need more details.

@angelatan2 angelatan2 moved this from Triage to Backlog in Ruby Engineering Board Dec 1, 2020
@angelatan2
Copy link
Contributor

@jeremybdk Thank you for sharing the additional details. We are trying to reproduce this issue with our test app but unfortunately, we are not having much success. We are wondering if there is another gem dependency that could cause this issue. Are you willing/able to share your gemfile with us?

@jeremybdk
Copy link

jeremybdk commented Dec 1, 2020

@angelatan2 better than that! I spent some time, creating a new rails app, adding all my gems, reproducing the crash, and removing most of my gems to see which ones was making the app crash. The weird thing is that it's not one gem that trigger the crash, it's only when I have the gems left in this Gemfile. You can find a link to the app here: https://github.com/jeremybdk/crash_newrelic

I hope that will help you :)

@mwlang
Copy link
Contributor

mwlang commented Dec 1, 2020

@jeremybdk thank you for that reproduction. I am able to at least see the crash locally with the app you put together.

@mwlang
Copy link
Contributor

mwlang commented Dec 2, 2020

UPDATE: It's a two-way contention with the Ruby agent and Airbrake both monkey patching Net::HTTP#request method. When only method-chaining is involved, the infinite recursion does not occur since both gems use the same approach. When prepend method is mixed with method-chaining, then we end up with exactly the issue the above referenced blog post describes.

image

This is a potential fix and has not yet passed all tests in continuous integration. Please do check the status of PR #511 before proceeding to install or test this.

gem 'newrelic_rpm', 
  :git => 'https://github.com/newrelic/newrelic-ruby-agent.git',  
  :ref => '77e553b11301dc1e3d3376fb0a545fc2b29196ce'

@mwlang mwlang moved this from Backlog to In progress in Ruby Engineering Board Dec 2, 2020
@mwlang mwlang added this to the Ruby Infinite Tracing Polish milestone Dec 2, 2020
@mwlang mwlang self-assigned this Dec 2, 2020
@PikachuEXE
Copy link
Author

I think for my case it's rack-mini-profiler's patch to Net::HTTP
https://github.com/MiniProfiler/rack-mini-profiler/blob/v2.2.0/lib/patches/net_patches.rb

It seems like they already have solution for SystemStackError: stack level too deep issue:
https://github.com/MiniProfiler/rack-mini-profiler/tree/v2.2.0#nethttp-stack-level-too-deep-errors

Will post back after testing this method.
I am test 6.14.0 with prepend_net_instrumentation: false currently

@PikachuEXE
Copy link
Author

I confirm if I set prepend_net_instrumentation: true with 6.14.0
And also use rack-mini-profiler with net patch prepended

I can run network requests without SystemStackError

FastImage.type(
  "https://duckduckgo.com/assets/logo_homepage.alt.v108.svg",
  timeout: 5,
)
HTTPClient.new.get("https://duckduckgo.com/assets/logo_homepage.alt.v108.svg").code

@jeremybdk
Copy link

jeremybdk commented Dec 2, 2020

I did some more testing today regarding Airbrake, the issue disappear when upgrading to Airbrake v10.0.1
https://github.com/airbrake/airbrake/releases/tag/v10.0.1

Ruby Engineering Board automation moved this from In progress to Done/Pending Release Dec 2, 2020
@mwlang
Copy link
Contributor

mwlang commented Dec 2, 2020

@jeremybdk and @PikachuEXE Thank you both for testing further and confirming the fix works. As for Airbrake 10.0.1, it looks like they too are moving away from method chaining to a prepend strategy.

All further work to move away from method chaining to prepend will take place in our next major release (i.e. 7.0.0). We've been getting a lot of requests to move to prepend strategy for instrumenting various gems, but the risk/reward is too high to continue in the 6.x series of the agent.

@hector-able
Copy link

@mwlang thanks for your help explaining and resolving the issue. We had the same problem using Skylight, because they are still using alias_method skylightio/skylight-ruby#155. Maybe you want to add it to your fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Ruby Engineering Board
  
Code Complete/Done
Development

Successfully merging a pull request may close this issue.

8 participants