Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Net::ReadTimeout errors in net-pop by handling multi-byte terminators getting split up in readuntil #30

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

astapinski
Copy link

@astapinski astapinski commented May 13, 2024

Hi there. We recently did a Ruby update in our system. Part of that was using newer net-pop and net-protocol versions. After updating, we started seeing Net::ReadTimeout errors about once every day or two when reading POP messages and sometimes when reading the list of messages. net-pop would tend to get stuck on a message but would sometimes also succeed on later attempts.

The issue appears to be in optimizations made in readuntil for an offset. It appears the offset is used to avoid searching for a terminator at the beginning of the relevant buffer start. The optimization here sets offset to rbuf.bytesize whereas before the offset was always the starting @rbuf_offset. However, this doesn't always work when a multiple byte terminator is used like net-pop's "\r\n" as a read could end up with the read ending midway through a multiple byte terminator and then either:

  • Two chunks are read as one, so each_message_chunk fails to find the exact string ".\r\n" as two chunks are together.
  • Potentially the ending ".\r\n" could be read partially as well which would result in an extra read when no more data is coming.

I verified the first one in production by modifying each_message_chunk like so:

    def each_message_chunk
      LOG 'reading message...'
      LOG_off()
      read_bytes = 0
      while (line = readuntil("\r\n")) != ".\r\n"
        warn "NP LINE: #{line}"
        read_bytes += line.size
        yield line.delete_prefix('.')
      end
      warn "NP EACH_MESSAGE_CHUNK END"
      LOG_on()
      LOG "read message (#{read_bytes} bytes)"
    end

and verified this was the issue as the ending terminator was chunked together with the previous line:

NP LINE: 4E8=3D349" style=3D"display: none; width: 1px; height: 1px;">
NP LINE: </body>
NP LINE: </html>
NP LINE:  =20
NP LINE: 
NP LINE: --52f7naosv66--
.

The fix I came up with was to set the offset where it does not go too far into the buffer where the terminator could start. We have been running our fork with the potential fix for two weeks in production and have had no more Net::ReadTimeout issues in net-pop.

I looked and it appears everything else using readuntil uses single byte terminators like "\n" which wouldn't have been affected by this issue.

Relevant stacktrace details:

Apr 27 04:51:19Z APP app/single_threaded_worker.1 Net::ReadTimeout: Net::ReadTimeout with #<Socket:fd 18>
 Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:229:in `rbuf_fill'
 Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:199:in `readuntil'
 Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:377:in `each_message_chunk'
 Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-pop-0.1.2/lib/net/pop.rb:958:in `block in retr'
 Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-pop-0.1.2/lib/net/pop.rb:1016:in `critical'
 Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-pop-0.1.2/lib/net/pop.rb:956:in `retr'
 Apr 27 04:51:19Z APP app/single_threaded_worker.1 /app/vendor/bundle/ruby/3.1.0/gems/net-pop-0.1.2/lib/net/pop.rb:810:in `pop'

If a multi-byte line terminator is used (e.g. "\r\n") and a read attempt
into the buffer gets only part of the multi-byte terminator, then this
logic would get two lines together.

More importantly, if this terminator was at the very end of a POP3
message (i.e. end of message last line is ".\r\n") then another read
would be attempted for data which would never come. This would result in
a Net::ReadTimeout.
Mainly add a + 1 to the offset as the offset was 1 character below the
optimal offset if all but 1 byte of the terminator was present in @rbuf.
offset = @rbuf.bytesize
new_offset = @rbuf.bytesize - terminator.bytesize + 1
# Only assign the offset if it will advance.
# Otherwise an empty @rbuf could result in a negative offset.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix here was to just set the offset in a way that it should only be set far enough ahead that a multiple byte terminator would still be read completely on the next read. For the comment, I'm not honestly sure if an empty buffer could be an issue but I played it safe and checked that new_offset is greater than offset before setting it.

@@ -27,6 +27,26 @@ def test_each_crlf_line
end
end

def test_each_message_chunk_boundary
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will fail if one uses the old logic for readuntil as two chunks will be read together.

Just a cleanup that I forgot to do.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant