New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEATURE: Snapshots #459
FEATURE: Snapshots #459
Conversation
OMG this looks so good! |
Curious to know if you can measure the impact of the "should I snapshot" checking and quantify it in a number. Specifically for a simple rack application that simply returns OK for everything: (with the redis provider)
We got to try for under a millisecond here if possible. |
Thanks @SamSaffron! I wrote a little script to compare the impact of the "should take snapshot" check when snapshotting is enabled and disabled, and I've found that when snapshotting is disabled (i.e. So the impact is under a millisecond on my machine, but it'll probably be higher on production if redis lives in a different machine. If we to absolutely optimize here, I think (I have not tested this) it's possible to do the "should take snapshot" check in ruby and avoid redis entirely (like the MemoryStore implementation). There would be some downsides such as not perfect sampling (especially in multi-server environment), but the trade-off may be worth it. This is the benchmark script I used in case you want to review it: Benchmark script#!/usr/bin/env ruby
pid = fork do
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'rack'
gem 'redis'
gem 'rack-mini-profiler', path: "~/rack-mini-profiler"
end
class BenchMiddleware
def initialize(app)
@app = app
end
def call(env)
t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond)
@app.call(env)
bench_res = Process.clock_gettime(Process::CLOCK_MONOTONIC, :nanosecond) - t1
[200, {'Content-Type' => 'text/plain'}, ["#{bench_res}"]]
end
end
Rack::MiniProfiler.config.storage = Rack::MiniProfiler::RedisStore
Rack::MiniProfiler.config.snapshot_every_n_requests = ENV['SNAPSHOTS_PERIOD'].to_i
Rack::MiniProfiler.config.authorization_mode = :whitelist
module MPPatch
def take_snapshot?(*args)
super
false
end
end
Rack::MiniProfiler.prepend(MPPatch)
app = Rack::Builder.app do
use BenchMiddleware
use Rack::MiniProfiler
run lambda { |env|
[200, {'Content-Type' => 'text/plain'}, ['OK']]
}
end
Rack::Handler.default.run(
app,
Port: 4321,
BindAddress: "127.0.0.1",
Logger: WEBrick::Log.new("/dev/null"),
AccessLog: []
)
end
begin
require 'net/http'
sleep(0.5)
Net::HTTP.get(URI("http://127.0.0.1:4321"))
iterations = ENV['ITERATIONS'] ? ENV['ITERATIONS'].to_i : 10
sum = 0
iterations.times do
sum += Net::HTTP.get(URI("http://127.0.0.1:4321")).to_f / (1000 * 1000)
end
puts "Average: #{sum / iterations} (ms)"
ensure
Process.kill("TERM", pid)
end |
Hmm we can leave it with 0.21ms impact for v1, then leave interfaces around so we can implement it using DistributedCache (as implemented in message_bus), with that structure we would get extremely fast reads and slightly slower writes. I say:
Once done I say we merge this in. |
@SamSaffron, Yes, the vast majority of the 0.2 milliseconds is the cost of the redis call we make. If I remove the redis call and simply Also, I'm not sure I understand how DistributedCache will make a difference here. The "should take snapshot" check writes to redis on every request (we increment a counter and then check the new value to determine whether we should return true or false, all this is done in a LUA script). My understanding of DistributedCache is that it also uses redis under the hood (redis is one of many backends MessageBus supports) to sync data between processes, and every time something is written to DistributedCache, a redis call is fired to sync the data etc. So I don't see how there will be a difference using DistributedCache because we would still be making a redis call when incrementing the counter on every request. (To be clear I'm not pushing against implementing DistributedCache, I'm just struggling to see how it would help us). |
Its a great question, DistributedCache could let us cheat a bit but I think it would result in some potential over sampling. We could increment a counter async, on each request, then when deciding if we want to sample or not we could grab our waiting in memory replica. Something like:
This is a very imperfect solution, it could over-sample, especially when under extreme load, it could under-sample if background thread is backlogged. That said, it avoids having a constant cost associated with every request. Given the whole cost here is quite small, I say just commit as is. Only slight change might be to use pre compiled lua like message_bus does, you may get an ever so slight perf improvement, though I think all cost is probably the round trip. Anyway, feel free to merge this in once you are comfortable, I am good to go here. |
Alright, I'm merging this in now, thanks for your review Sam! ❤️ |
The purpose of this feature is to help find performance issues that are triggered by rare conditions in your application. This feature works by profiling one request every N requests that your application receives, and storing the performance metrics that Mini Profiler collected a.k.a "snapshot" of that request. Then you can access a page where you can see a list of all snapshots grouped by route/path and sorted from worst to best.
Screenshots:
This is the overview page where you can see a list of snapshot groups with the worst time of each group:
(Note: the screenshot is from my local install of Discourse. The
/javascripts/*
requests would be excluded from sampling; they're shown in my screenshot because I turned off something in my local install).Clicking on a group takes you to this page that displays all the snapshot of that group:
Finally, clicking on a snapshot ID takes you to this page that shows all the SQL queries and steps of the profiled request: