Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Snapshots #459

Merged
merged 14 commits into from Aug 31, 2020
Merged

FEATURE: Snapshots #459

merged 14 commits into from Aug 31, 2020

Conversation

OsamaSayegh
Copy link
Collaborator

The purpose of this feature is to help find performance issues that are triggered by rare conditions in your application. This feature works by profiling one request every N requests that your application receives, and storing the performance metrics that Mini Profiler collected a.k.a "snapshot" of that request. Then you can access a page where you can see a list of all snapshots grouped by route/path and sorted from worst to best.

Screenshots:

This is the overview page where you can see a list of snapshot groups with the worst time of each group:

(Note: the screenshot is from my local install of Discourse. The /javascripts/* requests would be excluded from sampling; they're shown in my screenshot because I turned off something in my local install).

Clicking on a group takes you to this page that displays all the snapshot of that group:

Finally, clicking on a snapshot ID takes you to this page that shows all the SQL queries and steps of the profiled request:

@SamSaffron
Copy link
Member

OMG this looks so good!

@SamSaffron
Copy link
Member

Curious to know if you can measure the impact of the "should I snapshot" checking and quantify it in a number.

Specifically for a simple rack application that simply returns OK for everything: (with the redis provider)

  1. What is the impact of rack mini profiler when snapshotting is disabled on non-profiled requests?
  2. What is the impact of rack mini profiler when snapshotting is enabled on non-profiled requests?

We got to try for under a millisecond here if possible.

@OsamaSayegh
Copy link
Collaborator Author

Thanks @SamSaffron!

I wrote a little script to compare the impact of the "should take snapshot" check when snapshotting is enabled and disabled, and I've found that when snapshotting is disabled (i.e. snapshot_every_n_requests is 0), a request takes on average 0.01 milliseconds. And when it's enabled, a request takes on average 0.21 milliseconds.

So the impact is under a millisecond on my machine, but it'll probably be higher on production if redis lives in a different machine. If we to absolutely optimize here, I think (I have not tested this) it's possible to do the "should take snapshot" check in ruby and avoid redis entirely (like the MemoryStore implementation). There would be some downsides such as not perfect sampling (especially in multi-server environment), but the trade-off may be worth it.

This is the benchmark script I used in case you want to review it:

Benchmark script
#!/usr/bin/env ruby

pid = fork do
  require 'bundler/inline'

  gemfile do
    source 'https://rubygems.org'

    gem 'rack'
    gem 'redis'
    gem 'rack-mini-profiler', path: "~/rack-mini-profiler"
  end

  class BenchMiddleware
    def initialize(app)
      @app = app
    end

    def call(env)
      t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond)
      @app.call(env)
      bench_res = Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond) - t1
      [200, {'Content-Type' => 'text/plain'}, ["#{bench_res}"]]
    end
  end

  Rack::MiniProfiler.config.storage = Rack::MiniProfiler::RedisStore
  Rack::MiniProfiler.config.snapshot_every_n_requests = ENV['SNAPSHOTS_PERIOD'].to_i
  Rack::MiniProfiler.config.authorization_mode = :whitelist

  module MPPatch
    def take_snapshot?(*args)
      super
      false
    end
  end
  Rack::MiniProfiler.prepend(MPPatch)

  app = Rack::Builder.app do
    use BenchMiddleware
    use Rack::MiniProfiler
    run lambda { |env|
      [200, {'Content-Type' => 'text/plain'}, ['OK']]
    }
  end

  Rack::Handler.default.run(
    app,
    Port: 4321,
    BindAddress: "127.0.0.1",
    Logger: WEBrick::Log.new("/dev/null"),
    AccessLog: []
  )
end

begin
  require 'net/http'
  sleep(0.5)
  Net::HTTP.get(URI("http://127.0.0.1:4321"))
  iterations = ENV['ITERATIONS'] ? ENV['ITERATIONS'].to_i : 10
  sum = 0
  iterations.times do
    sum += Net::HTTP.get(URI("http://127.0.0.1:4321")).to_f / (1000 * 1000)
  end
  puts "Average: #{sum / iterations} (ms)"
ensure
  Process.kill("TERM", pid)
end

@SamSaffron
Copy link
Member

Hmm we can leave it with 0.21ms impact for v1, then leave interfaces around so we can implement it using DistributedCache (as implemented in message_bus), with that structure we would get extremely fast reads and slightly slower writes.

I say:

  1. See if you can get this down to 0.15 impact, it is a good little bit of practice (is 100% of the cost talking to redis or is there other stuff going on)

  2. Make sure we can easily implement using DistributedCache, maybe have an implementation as well for that, measure perf.

Once done I say we merge this in.

@OsamaSayegh
Copy link
Collaborator Author

(is 100% of the cost talking to redis or is there other stuff going on)

@SamSaffron, Yes, the vast majority of the 0.2 milliseconds is the cost of the redis call we make. If I remove the redis call and simply return true, the 0.2 milliseconds completely disappear and the time becomes the same as if snapshotting was disabled (0.01 ms).

Also, I'm not sure I understand how DistributedCache will make a difference here. The "should take snapshot" check writes to redis on every request (we increment a counter and then check the new value to determine whether we should return true or false, all this is done in a LUA script). My understanding of DistributedCache is that it also uses redis under the hood (redis is one of many backends MessageBus supports) to sync data between processes, and every time something is written to DistributedCache, a redis call is fired to sync the data etc. So I don't see how there will be a difference using DistributedCache because we would still be making a redis call when incrementing the counter on every request.

(To be clear I'm not pushing against implementing DistributedCache, I'm just struggling to see how it would help us).

@SamSaffron
Copy link
Member

Its a great question, DistributedCache could let us cheat a bit but I think it would result in some potential over sampling.

We could increment a counter async, on each request, then when deciding if we want to sample or not we could grab our waiting in memory replica.

Something like:

should_sample = cache[route].to_i > 200
if should_sample
   cache[route] = 0
   ... 
else
   InBackgroundThread do
       v = cache[route].to_i
       cache[route] = v + 1
   end
end

This is a very imperfect solution, it could over-sample, especially when under extreme load, it could under-sample if background thread is backlogged.

That said, it avoids having a constant cost associated with every request.

Given the whole cost here is quite small, I say just commit as is. Only slight change might be to use pre compiled lua like message_bus does, you may get an ever so slight perf improvement, though I think all cost is probably the round trip.

Anyway, feel free to merge this in once you are comfortable, I am good to go here.

@OsamaSayegh
Copy link
Collaborator Author

Alright, I'm merging this in now, thanks for your review Sam! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants