New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations/syscalls for readline #796
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, that is a lot of allocations indeed. I'm working this morning on a careful review and some feedback. I look forward to working through and improving this with you, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the detailed report and first pass at fixes. I've provided some detailed questions and feedback as I worked on more fully wrapping my head around the changes. I think we are headed in a good direction, but might have a little more work to do to cleanup and limit the changes to what is strictly needed for the fix. I look forward to learning more about the changes and continuing to talk through it with you. Thanks!
buffer | ||
result = String.new | ||
block = @read_buffer | ||
@read_buffer = String.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @read_buffer
should already be setup from initialize
. Is there a particular reason to overwrite that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just moving @read_buffer
to block
, so that we can treat it the same way as future reads. Maybe this would be best expressed like:
block, @read_buffer = @read_buffer, String.new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Yeah, I think I like the inline version you suggest.
if idx.nil? | ||
result << block | ||
else | ||
add_to_read_buffer(block.slice!(idx+1, block.length)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is the only place where this method call is used. Is there a particular reason to have it separate, vs just having the related code inline here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could inline it! I just preferred to name it for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I think I lean toward inline to avoid the overhead of method calls. I know it's small, but want to keep inner loop streamlined where we can.
@@ -172,20 +187,21 @@ def connect | |||
end | |||
end | |||
|
|||
def add_to_read_buffer(str) | |||
@read_buffer << str | |||
@eof = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since @eof
is initialized as false, I would expect it to still be false here. Is there a case where it would have been set to something other than false where it would then need changed back here? Otherwise, it may be better to not do this (to avoid complexity and/or confusion in the future if we don't really need to do this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we're trying to preserve the invariant: @eof is true only if there is no buffered data which we could return.
For example, imagine this series of events, assuming chunk_size = 1 MB:
- Socket is initialized, @eof = false
- readline is called
- readline calls read_nonblock, which reads 800K and gets EOF. Now @eof is true, and read_nonblock returns an 800K block of data
- readline finds the first newline in the block, and it's only 100 bytes in
- readline calls add_to_read_buffer to put the remaining 799.9K back in the buffer
- Now
@eof
has to be false, since a call to read_nonblock could return successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. This relates to the new treatment of the buffer as well, I think it's all coming together in my head.
def initialize(data = {}) | ||
@data = data | ||
@nonblock = data[:nonblock] | ||
@port ||= @data[:port] || 80 | ||
@read_buffer = String.new | ||
@eof = false | ||
@backend_eof = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm I'm understanding this. It looks like the distinction this introduces is that @backend_eof
is when the socket raises that it is done, whereas @eof
is when that has occured AND the @read_buffer
is also empty. Is that correct? I wanted to make sure, since this distinction didn't exist previously, or at least not in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. We need some way to distinguish "we've fully read the underlying socket" from "we have no more data to return to the client"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the clarification.
end | ||
rescue OpenSSL::SSL::SSLError => error | ||
if error.message == 'read would block' | ||
select_with_timeout(@socket, :read) && retry | ||
if @read_buffer.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me understand why this guard was introduced here? I can see that we guard in part of the rest of the error handling, but I don't want to add it here if we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was a mistake in a previous PR, this was added to the wrong place. Blocking reads don't touch @read_buffer, so they shouldn't care about it—on the other hand, non-block reads use the buffer, and if they have some data to return, they should never select (ie: block).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K. Yeah, if we could swap the guard back to where it was that would be great. My memory is sometimes fuzzy on these things, but pretty sure the guards where how they were for a reason. So good to not change it on accident.
if @read_buffer.empty? | ||
select_with_timeout(@socket, :read) && retry | ||
end | ||
select_with_timeout(@socket, :read) && retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see now that the guard moved from here to above (rather than being duplicated). Sorry for the confusion. Still, would like to better understand why that changed. Thanks.
@read_buffer << @socket.read_nonblock(@data[:chunk_size]) | ||
end | ||
while !@backend_eof && (!max_length || @read_buffer.length < max_length) | ||
@read_buffer << @socket.read_nonblock(@data[:chunk_size]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much it matters or not, but this looks like it could read a good deal more than max_length
into the buffer in some cases. Is that desired? ie I suppose the worst case would be if the buffer contained one less than max_length
initially. In the old implementation it would just get one byte and end up with max_length
, whereas with this change it would end up with 2 * max_length - 1
, I think? Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to wrap my head around the new backend_eof
stuff. I guess this change introduces the need for that as well, if I'm not mistaken? Prior to this change, if there is a max_length
it should restrict the buffer to that amount, so below when it reads out max_length
we should be confident that the buffer was emptied. Without that, it's possible that there is more than max_length
while this is set, causing material to be leftover in the buffer after a read, which this then allows to be read on subsequent calls. Does that match with your understanding/expectations?
I'd like to better understand why we would want to do it this way instead, as it seems a bit counterintuitive to me (and so makes me worry it might introduce problems down the road or be harder to maintain). Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, previously we basically emptied the buffer on ~every call. Which made the buffer not super useful! And causes lots and lots of extra syscalls when we're doing small reads.
Now, we try to read larger chunks of data, basically amortizing the "real work" (syscalls) of calls to read_nonblock. I agree this implementation can be a bit confusing, and would love to think through ways to make it simpler, maybe by abstracting out some of the buffer operations.
But I think fundamentally, we always should be batching reads this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that makes sense. Agreed that this seems better in terms of optimization, though it does feel a little harder to follow. This nitty gritty level stuff often seems inherently hard to follow (to me at least). I don't think that should preclude us from these kind of improvements, just that we may want to consider how to keep iterating and refining to make it easier to understand if possible.
@@ -208,6 +224,8 @@ def read_nonblock(max_length) | |||
# read until EOFError, so return everything | |||
@read_buffer.slice!(0, @read_buffer.length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this already was here, but should we simply return the read_buffer
itself here? Slicing it's entire contents would be an allocation otherwise, which doesn't seem ideal (though that could protect from mutating it, I suppose).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is just a shortcut for
ret, @read_buffer = @read_buffer, String.new
return ret
I'm not sure I find it clear either!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, similar to the comment above, in some ways there are two challenges. One is fixing some of the performance and memory characteristics, and one is trying to make things clearer and easier to understand. It's not the worst thing to understand, but it definitely takes some time for me whenever I revisit to recall how it all fits together. It may be worth spending some time on making it clearer and easier to follow, but I don't want to block these changes on that. I think it will be easier to address clarity as a separate effort than to try and mix the two together.
end | ||
end | ||
|
||
Shindo.tests('socket') do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what you describe, I imagine a number of these tests may have been failing at the outset of this. Could you describe what you were seeing prior to the changes you made to help me have context? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take a look at this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I just wanted to have a sense of what kind of improvement you've seen and also what to lookout for in terms of potential regressions in the future.
Thanks for the detailed review! Happy to chat more in comments, but also to take it to IRC/zoom/whatever if you'd find that helpful, let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for continuing to talk through it. I'm not too worried about the clarity as part of this, as it's complicated and I think we are both on the same page. But it may be worth some work on that as a followup to help us remember and maintain it in the future.
If I recall, I think these are probably the main remaining things that I'd like us to figure out before bringing this in:
- Could you revert the guard changes that were inadvertently changed?
- Could you also squash some of the commits down to help us keep the history a little cleaner and set us up better in case we should need to revert or anything? I don't think we will need to, but I always want to keep that path as clear as possible when making low level changes, just in case.
- Could you describe what the behavior of the tests would have been and/or would be in the case of regressions, just so I can understand a bit better?
Thanks!
bef44b6
to
7ac3568
Compare
|
Extracted the guard change to #797 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks again for such detailed reporting and for taking the time to talk and work through it with me. It's great to see this marked improvement in the memory usage (and I would guess it should help with performance as well). I'll get this and the guard fixes in and see about a release shortly.
Both changesets have been released as part of v0.94.0. Thanks again for the help and I hope this resolves the allocation challenges that started us down this path. |
We're reading a byte at a time for readline. In our tests of a large service at Stripe, this is causing 40% of total allocations!
This PR: