Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance loss when upgrading from Dalli 2.7.10 to 3.2.2 #930

Open
matthewzinke opened this issue Sep 8, 2022 · 5 comments
Open

Performance loss when upgrading from Dalli 2.7.10 to 3.2.2 #930

matthewzinke opened this issue Sep 8, 2022 · 5 comments

Comments

@matthewzinke
Copy link

We attempted to upgrade our Dalli gem from 2.7.10 to 3.2.2 recently, and noticed a performance loss associated with the upgrade. Out memcached latency went up by just under 10% and our sidekiq processes started spending 2-3x as much time on memcached. We suspect this has something to do with the change from send_multiget to pipelined_get.

We are wondering if anyone has also seen performance loss associated with the upgrade from 2 to 3, and/or if there are any configuration recommendations to preserve performance characteristics.

@jasonpenny
Copy link

send_multiget looks like it was removed in https://github.com/petergoldstein/dalli/pull/852/files?diff=unified&w=1#diff-a326ee7acf56937d92cdeeb4b920547a89162170aab2ad32391200ad1b4c3638L457

was this removed just to simplify the code to always use PipelinedGetter @petergoldstein ?

@petergoldstein
Copy link
Owner

@jasonpenny The send_multiget method was simply renamed to pipelined_get in that commit to get some naming consistency. The latter method is still there, and is now invoked by the PipelinedGetter.

It is certainly possible that there's a performance loss here (possibly in the ResponseBuffer), but I'd need to get some more detail to start debugging.

@jasonpenny
Copy link

ah, I see.

I see a slowdown with a benchmark using ActiveSupport::Cache::MemCacheStore:

root@d6ba62960da5:/# DALLI_VERSION=2.7.10 ruby bench.rb
                                            user     system      total        real
2.7.10 read_multi                       8.122310   2.830021  10.952331 ( 12.855987)
root@d6ba62960da5:/# DALLI_VERSION=2.7.10 ruby bench.rb
                                            user     system      total        real
2.7.10 read_multi                       8.037037   2.876596  10.913633 ( 12.790800)

root@d6ba62960da5:/# DALLI_VERSION=3.2.2 ruby bench.rb
                                            user     system      total        real
3.2.2 read_multi                        9.198624   2.890378  12.089002 ( 13.673892)
root@d6ba62960da5:/# DALLI_VERSION=3.2.2 ruby bench.rb
                                            user     system      total        real
3.2.2 read_multi                        9.372887   2.795796  12.168683 ( 13.765607)
benchmark setup in a docker container
docker run -it --rm "ruby:2.7.3" bash

apt-get update -qq
apt-get install -y sudo
useradd test

git clone --depth 1 --branch main https://github.com/petergoldstein/dalli dalli

MEMCACHED_VERSION=1.6.10 bash dalli/scripts/install_memcached.sh

echo "
# frozen_string_literal: true
require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'activesupport', '6.1.6.1'
  gem 'dalli', ENV['DALLI_VERSION']
end

require 'benchmark'
require 'active_support/all'

Benchmark.bm(37) do |x|
  x.report(%Q{#{ENV['DALLI_VERSION']} read_multi}) do
    c = ActiveSupport::Cache::MemCacheStore.new(['127.0.0.1:23417'])

    keys = ['One', 'Two' * 8, 'Three' * 40]
    keys.each { c.write(_1, rand.to_s * _1.length) }

    100_000.times do
      c.read_multi(*keys)
    end
  end
end
" > bench.rb

memcached -p 23417 -u test &

DALLI_VERSION=2.7.10 ruby bench.rb
DALLI_VERSION=3.2.2 ruby bench.rb

@jasonpenny
Copy link

I am no expert on profiling ruby, but I tried a few tools and did not find any obvious hotspots.

Something I did notice was that the callgraph of dalli v3 is larger, and I wonder if calling extra methods/more objects would be the cause of slowness here since we're calling it so often.

capture callgraph
# frozen_string_literal: true

require 'fileutils'
require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'activesupport', '6.1.6.1'
  gem 'stackprof'
  gem 'dalli', path: ENV['DALLI_PATH']
  gem 'byebug'
  gem 'ruby-prof'
end

puts "Dalli version: #{Dalli::VERSION}"

dc = Dalli::Client.new('127.0.0.1:11211')
dc.flush_all
keys = ['One', 'Two' * 8, 'Three' * 40]
keys.each { dc.set(_1, rand.to_s * _1.length) }
dc = nil

RubyProf.start
c = Dalli::Client.new('127.0.0.1:11211', socket_timeout: 5)
values = nil
10.times do
  values = c.get_multi(*keys)
end
results = RubyProf.stop
puts values

output_dir = "./#{ENV['DALLI_PATH']}_calltree" 
FileUtils.mkdir_p(output_dir)
RubyProf::CallTreePrinter.new(results).print({ path: output_dir })

Dalli 2

dalli2

Dalli 3

dalli3

callgrind profiles: https://gist.github.com/jasonpenny/fed81267c8e0c987e63839a135f9db4a

@taiyyu
Copy link

taiyyu commented Feb 28, 2023

We also saw performance degradation. We had to disable compress to get performance matching 2.7.x while using :mem_cache_store in ruby 3.1.3, rails 6.1.4, and dalli 3.2.x.

It would be nice to re-enable compress and still be able to upgrade to 3.2.x in the future. Have not been able to root cause exactly why, but hopefully, this gives a little clue towards a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants