FEATURE: Snapshots #459

OsamaSayegh · 2020-08-26T16:56:23Z

The purpose of this feature is to help find performance issues that are triggered by rare conditions in your application. This feature works by profiling one request every N requests that your application receives, and storing the performance metrics that Mini Profiler collected a.k.a "snapshot" of that request. Then you can access a page where you can see a list of all snapshots grouped by route/path and sorted from worst to best.

Screenshots:

This is the overview page where you can see a list of snapshot groups with the worst time of each group:

(Note: the screenshot is from my local install of Discourse. The /javascripts/* requests would be excluded from sampling; they're shown in my screenshot because I turned off something in my local install).

Clicking on a group takes you to this page that displays all the snapshot of that group:

Finally, clicking on a snapshot ID takes you to this page that shows all the SQL queries and steps of the profiled request:

lib/mini_profiler/profiler.rb

lib/mini_profiler/config.rb

SamSaffron · 2020-08-28T07:24:38Z

OMG this looks so good!

SamSaffron · 2020-08-28T07:35:53Z

Curious to know if you can measure the impact of the "should I snapshot" checking and quantify it in a number.

Specifically for a simple rack application that simply returns OK for everything: (with the redis provider)

What is the impact of rack mini profiler when snapshotting is disabled on non-profiled requests?
What is the impact of rack mini profiler when snapshotting is enabled on non-profiled requests?

We got to try for under a millisecond here if possible.

OsamaSayegh · 2020-08-30T19:15:00Z

Thanks @SamSaffron!

I wrote a little script to compare the impact of the "should take snapshot" check when snapshotting is enabled and disabled, and I've found that when snapshotting is disabled (i.e. snapshot_every_n_requests is 0), a request takes on average 0.01 milliseconds. And when it's enabled, a request takes on average 0.21 milliseconds.

So the impact is under a millisecond on my machine, but it'll probably be higher on production if redis lives in a different machine. If we to absolutely optimize here, I think (I have not tested this) it's possible to do the "should take snapshot" check in ruby and avoid redis entirely (like the MemoryStore implementation). There would be some downsides such as not perfect sampling (especially in multi-server environment), but the trade-off may be worth it.

This is the benchmark script I used in case you want to review it:

Benchmark script

#!/usr/bin/env ruby

pid = fork do
  require 'bundler/inline'

  gemfile do
    source 'https://rubygems.org'

    gem 'rack'
    gem 'redis'
    gem 'rack-mini-profiler', path: "~/rack-mini-profiler"
  end

  class BenchMiddleware
    def initialize(app)
      @app = app
    end

    def call(env)
      t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond)
      @app.call(env)
      bench_res = Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond) - t1
      [200, {'Content-Type' => 'text/plain'}, ["#{bench_res}"]]
    end
  end

  Rack::MiniProfiler.config.storage = Rack::MiniProfiler::RedisStore
  Rack::MiniProfiler.config.snapshot_every_n_requests = ENV['SNAPSHOTS_PERIOD'].to_i
  Rack::MiniProfiler.config.authorization_mode = :whitelist

  module MPPatch
    def take_snapshot?(*args)
      super
      false
    end
  end
  Rack::MiniProfiler.prepend(MPPatch)

  app = Rack::Builder.app do
    use BenchMiddleware
    use Rack::MiniProfiler
    run lambda { |env|
      [200, {'Content-Type' => 'text/plain'}, ['OK']]
    }
  end

  Rack::Handler.default.run(
    app,
    Port: 4321,
    BindAddress: "127.0.0.1",
    Logger: WEBrick::Log.new("/dev/null"),
    AccessLog: []
  )
end

begin
  require 'net/http'
  sleep(0.5)
  Net::HTTP.get(URI("http://127.0.0.1:4321"))
  iterations = ENV['ITERATIONS'] ? ENV['ITERATIONS'].to_i : 10
  sum = 0
  iterations.times do
    sum += Net::HTTP.get(URI("http://127.0.0.1:4321")).to_f / (1000 * 1000)
  end
  puts "Average: #{sum / iterations} (ms)"
ensure
  Process.kill("TERM", pid)
end

SamSaffron · 2020-08-30T22:21:36Z

Hmm we can leave it with 0.21ms impact for v1, then leave interfaces around so we can implement it using DistributedCache (as implemented in message_bus), with that structure we would get extremely fast reads and slightly slower writes.

I say:

See if you can get this down to 0.15 impact, it is a good little bit of practice (is 100% of the cost talking to redis or is there other stuff going on)
Make sure we can easily implement using DistributedCache, maybe have an implementation as well for that, measure perf.

Once done I say we merge this in.

OsamaSayegh · 2020-08-31T06:56:58Z

(is 100% of the cost talking to redis or is there other stuff going on)

@SamSaffron, Yes, the vast majority of the 0.2 milliseconds is the cost of the redis call we make. If I remove the redis call and simply return true, the 0.2 milliseconds completely disappear and the time becomes the same as if snapshotting was disabled (0.01 ms).

Also, I'm not sure I understand how DistributedCache will make a difference here. The "should take snapshot" check writes to redis on every request (we increment a counter and then check the new value to determine whether we should return true or false, all this is done in a LUA script). My understanding of DistributedCache is that it also uses redis under the hood (redis is one of many backends MessageBus supports) to sync data between processes, and every time something is written to DistributedCache, a redis call is fired to sync the data etc. So I don't see how there will be a difference using DistributedCache because we would still be making a redis call when incrementing the counter on every request.

(To be clear I'm not pushing against implementing DistributedCache, I'm just struggling to see how it would help us).

SamSaffron · 2020-08-31T07:12:45Z

Its a great question, DistributedCache could let us cheat a bit but I think it would result in some potential over sampling.

We could increment a counter async, on each request, then when deciding if we want to sample or not we could grab our waiting in memory replica.

Something like:

should_sample = cache[route].to_i > 200
if should_sample
   cache[route] = 0
   ... 
else
   InBackgroundThread do
       v = cache[route].to_i
       cache[route] = v + 1
   end
end

This is a very imperfect solution, it could over-sample, especially when under extreme load, it could under-sample if background thread is backlogged.

That said, it avoids having a constant cost associated with every request.

Given the whole cost here is quite small, I say just commit as is. Only slight change might be to use pre compiled lua like message_bus does, you may get an ever so slight perf improvement, though I think all cost is probably the round trip.

Anyway, feel free to merge this in once you are comfortable, I am good to go here.

OsamaSayegh · 2020-08-31T15:18:04Z

Alright, I'm merging this in now, thanks for your review Sam! ❤️

OsamaSayegh added 10 commits August 20, 2020 17:53

Commit 1

e07ddea

Limit the number of snapshots groups to 300 by default

6228651

Add test case for snapshots custom fields

745d143

Display custom fields on snapshot page

d0b4b07

Simplify/change snapshots storage

fa0e19e

Correct key instance var name

7e06742

Pass arguments in correct order to method

2f7071b

Some changes to snapshot page

e7ab0f6

Minor changes

41f61f3

Remove unnecessary string sub call

047dd09

OsamaSayegh commented Aug 26, 2020

View reviewed changes

lib/mini_profiler/profiler.rb Show resolved Hide resolved

OsamaSayegh commented Aug 26, 2020

View reviewed changes

lib/mini_profiler/config.rb Show resolved Hide resolved

OsamaSayegh added 4 commits August 31, 2020 14:16

Cache counter script

9d49272

Documnet snapshots in README

8e30dc1

Add some assertions

209b421

Document more things

9ca2e76

OsamaSayegh merged commit 8fbc5bf into MiniProfiler:master Aug 31, 2020

OsamaSayegh deleted the snapshots branch August 31, 2020 15:18

adambair mentioned this pull request Nov 23, 2020

fetch_snapshots_overview not implemented for FileStore - support or warn if not supported #470

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEATURE: Snapshots #459

FEATURE: Snapshots #459

OsamaSayegh commented Aug 26, 2020

SamSaffron commented Aug 28, 2020

SamSaffron commented Aug 28, 2020

OsamaSayegh commented Aug 30, 2020

SamSaffron commented Aug 30, 2020

OsamaSayegh commented Aug 31, 2020

SamSaffron commented Aug 31, 2020

OsamaSayegh commented Aug 31, 2020

FEATURE: Snapshots #459

FEATURE: Snapshots #459

Conversation

OsamaSayegh commented Aug 26, 2020

SamSaffron commented Aug 28, 2020

SamSaffron commented Aug 28, 2020

OsamaSayegh commented Aug 30, 2020

SamSaffron commented Aug 30, 2020

OsamaSayegh commented Aug 31, 2020

SamSaffron commented Aug 31, 2020

OsamaSayegh commented Aug 31, 2020