Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase readpartial() from 1KB to 16KB #880

Merged
merged 1 commit into from Nov 6, 2019
Merged

Conversation

schanjr
Copy link

@schanjr schanjr commented Nov 6, 2019

I noticed the redis-rb gem is using 1KB per readpartial() call. This is considered very small in comparison to something like Mongrel, Unicorn, Puma, Passenger, and Net::HTTP, of which all of them has 16KB as their read length.

Wanted to see what are your feedbacks on this.

@byroot
Copy link
Collaborator

byroot commented Nov 6, 2019

in comparison to something like Mongrel, Unicorn, Puma, Passenger, and Net::HTTP,

All of which deal with HTTP request / responses, so not sure it is a proper argument.

However digging into hiredis shows that some experimentation was done and that 16k is indeed yield better performance: redis/hiredis@f7f022e

So 👍

@byroot byroot merged commit c7b69ba into redis:master Nov 6, 2019
@liaden
Copy link

liaden commented Jul 1, 2020

Our production sidekiq instance saw memory usage grow drastically with this change:
Screen Shot 2020-07-01 at 11 11 24 AM

We downgraded on 5/20, and the memory usage went back to 8-10GB range whereas it was at 11.5-14GB on 4.1.4. Would it be reasonable to expose this value as a configurable with the default of 16KB? I can open a PR if it is.

@byroot
Copy link
Collaborator

byroot commented Jul 1, 2020

I'm not against making it configurable, but what concerns me more is how it could so significantly increase you memory usage.

This is just a buffer, it's supposed to be consumed quickly, it shouldn't hold memory for that long.

@liaden
Copy link

liaden commented Jul 1, 2020

One of my thoughts is that pulling off 16.5kb of data is going to overallocate 15.5kb of memory (which is accumulating over time for some reason) and thus the increase of memory. If you have any ideas on where I can investigate to see what is happening, that would help short circuit things on my end a bit.

@casperisfine
Copy link

(answering from another computer)

which is accumulating over time for some reason

Well, if it's accumulating, the bug would be that it accumulate. I don't see how 1 chunk of 16K would be "accumulating" faster than 4 blocks of 4K.

I dumped the buffer on my machine to see:

setup:

Redis.new.set("foo", "bar" * 10_000)

After reading from the socket, @buffer is:

{"address":"0x7faf5603eb68", "type":"STRING", "class":"0x7faf520bb5b8", "bytesize":16384, "capacity":24575, "value":"$30000\r\nbarbarb........barbarbar", "encoding":"UTF-8", "memsize":24616, "flags":{"wb_protected":true}}

But just after it's slice!:

{"address":"0x7faf5603eb68", "type":"STRING", "class":"0x7faf520bb5b8", "shared":true, "encoding":"UTF-8", "references":["0x7faf56037070"], "memsize":40, "flags":{"wb_protected":true}}

We see that Ruby is smart and simply point to a shared empty string. We can assume the allocated string pointer was freed (I'll double check, but I highly doubt such an obvious bug wouldn't have been fixed and reported yet).

So here's a bunch of theories:

1 - Connection leaks

Maybe somehow you are leaking Connection::Ruby instances? I've seen report and suspicions of such issues, but never described well enough to be able to reproduce and fix them.

A way to verify this would be to get a shell into a ruby process using rbtrace, and then check ObjectSpace.each_object(Redis::Connection::Ruby).size, see if it's too high. Or if rbtrace is complicated, you can write a sidekiq middleware to log that number somewhere.

If you can prove this is happening, then I'd be extremely interested in debugging this forward with you.

2 - Memory fragmentation

If you are using the default memory allocator, untuned, and sidekiq with many threads, then you might very well experience a lot of memory fragmentation.

  • What version of Ruby ?
  • What allocator * what settings?
  • Ever tried jemalloc / tcmalloc? They can help a lot with fragmentation. Alternatively configuring MALLOC_ARENA_MAX=2 can help a lot as well.

Other questions

  • These shark fins are quite concerning. What is it graphing exactly? I presume RSS?
  • If you deploy a sidekiq worker to only process tons of empty jobs, does it leak the same way?
  • Same thing but with a simple job that does a simple get/set in Redis?
  • Are you seeing similar growth with you web processes?

@liaden
Copy link

liaden commented Jul 1, 2020

Thanks for the thorough answer. I missed today's release so it will be tomorrow before I can try out rbtrace specifically in production.

Answers

  1. Ruby 2.7.1 with standard allocator.
  2. Attempted to switch to jemalloc via LD_PRELOAD as well as using the fullstaq.io container and then did high apache benchmark with with 5 concurrent requests against 5 puma processes and did not see any memory reduction which is sad since I've previously used jemalloc to great affect.
  3. MALLOC_ARENA_MAX=2 was also tried by another engineer and saw no improvement.
  4. Yes, RSS.
  5. Will try out a bit this afternoon.
  6. Ditto
  7. Yes, and maybe same proprotional value:

Screen Shot 2020-07-01 at 12 45 34 PM

@byroot
Copy link
Collaborator

byroot commented Jul 1, 2020

That last graph is much less concerning. The RSS seem to stabilize around 1GB, which makes much more sense and is quite typical. The average app have a bunch of code and data that isn't properly eagerloaded, so there some memory growth on the first few requests. Plus it takes some time to reach a request that allocates quite a lot, and after that memory isn't reclaimed.

If anything this graphs makes the previous one much more concerning. Your sidekiq processes are using several times more memory than your web processes, that's not normal (unless they use much more threads), and they don't seem to stabilize. You either have a nasty leak in your sidekiq processes, or excessive fragmentation.

@liaden
Copy link

liaden commented Aug 12, 2020

@byroot I just wanted to follow up that the big issue was a memory leak in sidekiq/sidekiq#4652

I think the above configuration change made it explode for us due a bunch of ruby internal objects that look like:

-[ RECORD 1 ]-----+------------------------------------
id                | 576368
time              | ¤
type              | IMEMO
node_type         | ¤
root              | ¤
address           | 0x556f575b0c90
value             | ¤
klass             | 0x556f506f0a58
name              | ¤
struct            | ¤
file              | /usr/local/lib/ruby/2.7.0/socket.rb
line              | 452
method            | __read_nonblock
generation        | 57
size              | ¤
length            | ¤
memsize           | 72
bytesize          | ¤
capacity          | ¤
ivars             | ¤
fd                | ¤
encoding          | ¤
default_address   | ¤
freezed           | ¤
fstring           | ¤
embedded          | ¤
shared            | ¤
flag_wb_protected | t
flag_old          | ¤
flag_long_lived   | ¤
flag_marking      | ¤
flag_marked       | ¤

In total those IMEMO objects had 183 counts:

mem_analysis=# select count(*), o.file from space_objects o join space_objects k on o.klass = k.address where o.generation = 57 and k.name is null group by o.file;
 count |                                      file
-------+---------------------------------------------------------------------------------
     9 | /usr/local/bundle/gems/connection_pool-2.2.2/lib/connection_pool/timed_stack.rb
     7 | /usr/local/bundle/gems/sidekiq-6.0.7/lib/sidekiq/util.rb
   183 | /usr/local/lib/ruby/2.7.0/socket.rb
     1 | eval
(4 rows)

I don't know if any of the above proves useful for things that are related soley to redis-rb, but I wanted to share just in case. Anyways, thanks a huge bunch with the initial pointers!

@casperisfine
Copy link

Thanks for sharing, glad you figured out your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants