Reduce allocations/syscalls for readline #796

vasi-stripe · 2022-11-03T18:17:03Z

We're reading a byte at a time for readline. In our tests of a large service at Stripe, this is causing 40% of total allocations!

This PR:

Makes read_nonblock actually read chunks into a buffer, instead of limiting reads to maxlen
Makes readline check a block at a time, instead of byte at a time
Fixes a latent bug in read_nonblock on SSL sockets
Adds tests to prove that readline is actually efficient now

geemus

Yikes, that is a lot of allocations indeed. I'm working this morning on a careful review and some feedback. I look forward to working through and improving this with you, thanks!

geemus

Thanks again for the detailed report and first pass at fixes. I've provided some detailed questions and feedback as I worked on more fully wrapping my head around the changes. I think we are headed in a good direction, but might have a little more work to do to cleanup and limit the changes to what is strictly needed for the fix. I look forward to learning more about the changes and continuing to talk through it with you. Thanks!

geemus · 2022-11-04T14:07:17Z

lib/excon/socket.rb

-        buffer
+        result = String.new
+        block = @read_buffer
+        @read_buffer = String.new


I believe @read_buffer should already be setup from initialize. Is there a particular reason to overwrite that here?

We're just moving @read_buffer to block, so that we can treat it the same way as future reads. Maybe this would be best expressed like:

block, @read_buffer = @read_buffer, String.new

Got it. Yeah, I think I like the inline version you suggest.

geemus · 2022-11-04T14:08:51Z

lib/excon/socket.rb

+          if idx.nil?
+            result << block
+          else
+            add_to_read_buffer(block.slice!(idx+1, block.length))


It looks like this is the only place where this method call is used. Is there a particular reason to have it separate, vs just having the related code inline here?

We could inline it! I just preferred to name it for clarity.

Sure. I think I lean toward inline to avoid the overhead of method calls. I know it's small, but want to keep inner loop streamlined where we can.

geemus · 2022-11-04T14:10:01Z

lib/excon/socket.rb

@@ -172,20 +187,21 @@ def connect
      end
    end

+    def add_to_read_buffer(str)
+      @read_buffer << str
+      @eof = false


Since @eof is initialized as false, I would expect it to still be false here. Is there a case where it would have been set to something other than false where it would then need changed back here? Otherwise, it may be better to not do this (to avoid complexity and/or confusion in the future if we don't really need to do this).

Yeah, we're trying to preserve the invariant: @eof is true only if there is no buffered data which we could return.

For example, imagine this series of events, assuming chunk_size = 1 MB:

Socket is initialized, @eof = false

readline is called

readline calls read_nonblock, which reads 800K and gets EOF. Now @eof is true, and read_nonblock returns an 800K block of data

readline finds the first newline in the block, and it's only 100 bytes in

readline calls add_to_read_buffer to put the remaining 799.9K back in the buffer

Now @eof has to be false, since a call to read_nonblock could return successfully

Got it. This relates to the new treatment of the buffer as well, I think it's all coming together in my head.

geemus · 2022-11-04T14:14:51Z

lib/excon/socket.rb

    def initialize(data = {})
      @data = data
      @nonblock = data[:nonblock]
      @port ||= @data[:port] || 80
      @read_buffer = String.new
      @eof = false
+      @backend_eof = false


Just to confirm I'm understanding this. It looks like the distinction this introduces is that @backend_eof is when the socket raises that it is done, whereas @eof is when that has occured AND the @read_buffer is also empty. Is that correct? I wanted to make sure, since this distinction didn't exist previously, or at least not in this way.

Yes, that's right. We need some way to distinguish "we've fully read the underlying socket" from "we have no more data to return to the client"

Got it, thanks for the clarification.

geemus · 2022-11-04T14:18:25Z

lib/excon/socket.rb

        end
      rescue OpenSSL::SSL::SSLError => error
        if error.message == 'read would block'
-          select_with_timeout(@socket, :read) && retry
+          if @read_buffer.empty?


Could you help me understand why this guard was introduced here? I can see that we guard in part of the rest of the error handling, but I don't want to add it here if we don't need it.

I believe this was a mistake in a previous PR, this was added to the wrong place. Blocking reads don't touch @read_buffer, so they shouldn't care about it—on the other hand, non-block reads use the buffer, and if they have some data to return, they should never select (ie: block).

K. Yeah, if we could swap the guard back to where it was that would be great. My memory is sometimes fuzzy on these things, but pretty sure the guards where how they were for a reason. So good to not change it on accident.

geemus · 2022-11-04T14:20:29Z

lib/excon/socket.rb

-      if @read_buffer.empty?
-        select_with_timeout(@socket, :read) && retry
-      end
+      select_with_timeout(@socket, :read) && retry


Ah, I see now that the guard moved from here to above (rather than being duplicated). Sorry for the confusion. Still, would like to better understand why that changed. Thanks.

geemus · 2022-11-04T14:26:29Z

lib/excon/socket.rb

-            @read_buffer << @socket.read_nonblock(@data[:chunk_size])
-          end
+        while !@backend_eof && (!max_length || @read_buffer.length < max_length)
+          @read_buffer << @socket.read_nonblock(@data[:chunk_size])


I'm not sure how much it matters or not, but this looks like it could read a good deal more than max_length into the buffer in some cases. Is that desired? ie I suppose the worst case would be if the buffer contained one less than max_length initially. In the old implementation it would just get one byte and end up with max_length, whereas with this change it would end up with 2 * max_length - 1, I think? Does that make sense?

I was trying to wrap my head around the new backend_eof stuff. I guess this change introduces the need for that as well, if I'm not mistaken? Prior to this change, if there is a max_length it should restrict the buffer to that amount, so below when it reads out max_length we should be confident that the buffer was emptied. Without that, it's possible that there is more than max_length while this is set, causing material to be leftover in the buffer after a read, which this then allows to be read on subsequent calls. Does that match with your understanding/expectations?

I'd like to better understand why we would want to do it this way instead, as it seems a bit counterintuitive to me (and so makes me worry it might introduce problems down the road or be harder to maintain). Does that make sense?

Yeah, previously we basically emptied the buffer on ~every call. Which made the buffer not super useful! And causes lots and lots of extra syscalls when we're doing small reads.

Now, we try to read larger chunks of data, basically amortizing the "real work" (syscalls) of calls to read_nonblock. I agree this implementation can be a bit confusing, and would love to think through ways to make it simpler, maybe by abstracting out some of the buffer operations.

But I think fundamentally, we always should be batching reads this way.

Yeah, I think that makes sense. Agreed that this seems better in terms of optimization, though it does feel a little harder to follow. This nitty gritty level stuff often seems inherently hard to follow (to me at least). I don't think that should preclude us from these kind of improvements, just that we may want to consider how to keep iterating and refining to make it easier to understand if possible.

geemus · 2022-11-04T14:31:43Z

lib/excon/socket.rb

@@ -208,6 +224,8 @@ def read_nonblock(max_length)
        # read until EOFError, so return everything
        @read_buffer.slice!(0, @read_buffer.length)


I know this already was here, but should we simply return the read_buffer itself here? Slicing it's entire contents would be an allocation otherwise, which doesn't seem ideal (though that could protect from mutating it, I suppose).

Yeah, this is just a shortcut for

ret, @read_buffer = @read_buffer, String.new return ret

I'm not sure I find it clear either!

Yeah, similar to the comment above, in some ways there are two challenges. One is fixing some of the performance and memory characteristics, and one is trying to make things clearer and easier to understand. It's not the worst thing to understand, but it definitely takes some time for me whenever I revisit to recall how it all fits together. It may be worth spending some time on making it clearer and easier to follow, but I don't want to block these changes on that. I think it will be easier to address clarity as a separate effort than to try and mix the two together.

geemus · 2022-11-04T14:37:17Z

tests/socket_tests.rb

+  end
+end
+
+Shindo.tests('socket') do


From what you describe, I imagine a number of these tests may have been failing at the outset of this. Could you describe what you were seeing prior to the changes you made to help me have context? Thanks!

Will take a look at this soon.

Thanks. I just wanted to have a sense of what kind of improvement you've seen and also what to lookout for in terms of potential regressions in the future.

vasi-stripe · 2022-11-04T19:58:36Z

Thanks for the detailed review! Happy to chat more in comments, but also to take it to IRC/zoom/whatever if you'd find that helpful, let me know.

geemus

Thanks for continuing to talk through it. I'm not too worried about the clarity as part of this, as it's complicated and I think we are both on the same page. But it may be worth some work on that as a followup to help us remember and maintain it in the future.

If I recall, I think these are probably the main remaining things that I'd like us to figure out before bringing this in:

Could you revert the guard changes that were inadvertently changed?
Could you also squash some of the commits down to help us keep the history a little cleaner and set us up better in case we should need to revert or anything? I don't think we will need to, but I always want to keep that path as clear as possible when making low level changes, just in case.
Could you describe what the behavior of the tests would have been and/or would be in the case of regressions, just so I can understand a bit better?

Thanks!

vasi-stripe · 2022-11-07T16:57:08Z

I pushed a squashed version
Here's the errors we get running socket_tests against the master branch. You can see some early EOF, as well as a lot of one-byte reads.
Switching the guards back causes some tests to fail. Let me try to come up with an independent validation that the guards aren't in the right place, so we can fix that separately.

vasi-stripe · 2022-11-08T01:28:28Z

Extracted the guard change to #797

geemus

Looks good. Thanks again for such detailed reporting and for taking the time to talk and work through it with me. It's great to see this marked improvement in the memory usage (and I would guess it should help with performance as well). I'll get this and the guard fixes in and see about a release shortly.

geemus · 2022-11-08T14:57:00Z

Both changesets have been released as part of v0.94.0. Thanks again for the help and I hope this resolves the allocation challenges that started us down this path.

geemus requested changes Nov 4, 2022

View reviewed changes

geemus requested changes Nov 5, 2022

View reviewed changes

socket: Reduce allocations/syscalls for readline

7ac3568

vasi-stripe force-pushed the vasi-readline branch from bef44b6 to 7ac3568 Compare November 7, 2022 16:54

geemus approved these changes Nov 8, 2022

View reviewed changes

geemus merged commit 68452a7 into excon:master Nov 8, 2022

misalcedo mentioned this pull request Nov 20, 2023

Performance regression since version 0.71 #836

Closed

lgo mentioned this pull request Dec 10, 2023

Performance improvement on nonblocking sockets #838

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations/syscalls for readline #796

Reduce allocations/syscalls for readline #796

vasi-stripe commented Nov 3, 2022

geemus left a comment

geemus left a comment

geemus Nov 4, 2022

vasi-stripe Nov 4, 2022

geemus Nov 5, 2022

geemus Nov 4, 2022

vasi-stripe Nov 4, 2022

geemus Nov 5, 2022

geemus Nov 4, 2022

vasi-stripe Nov 4, 2022

geemus Nov 5, 2022

geemus Nov 4, 2022

vasi-stripe Nov 4, 2022

geemus Nov 5, 2022

geemus Nov 4, 2022

vasi-stripe Nov 4, 2022

geemus Nov 5, 2022

geemus Nov 4, 2022

geemus Nov 4, 2022

geemus Nov 4, 2022

vasi-stripe Nov 4, 2022

geemus Nov 5, 2022

geemus Nov 4, 2022

vasi-stripe Nov 4, 2022

geemus Nov 5, 2022

geemus Nov 4, 2022

vasi-stripe Nov 4, 2022

geemus Nov 5, 2022

vasi-stripe commented Nov 4, 2022

geemus left a comment

vasi-stripe commented Nov 7, 2022

vasi-stripe commented Nov 8, 2022

geemus left a comment

geemus commented Nov 8, 2022

		@@ -208,6 +224,8 @@ def read_nonblock(max_length)
		# read until EOFError, so return everything
		@read_buffer.slice!(0, @read_buffer.length)

Reduce allocations/syscalls for readline #796

Reduce allocations/syscalls for readline #796

Conversation

vasi-stripe commented Nov 3, 2022

geemus left a comment

Choose a reason for hiding this comment

geemus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasi-stripe commented Nov 4, 2022

geemus left a comment

Choose a reason for hiding this comment

vasi-stripe commented Nov 7, 2022

vasi-stripe commented Nov 8, 2022

geemus left a comment

Choose a reason for hiding this comment

geemus commented Nov 8, 2022