Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedisClient::ProtocolError: Unknown sigil type: "\r" #190

Closed
erikkessler1 opened this issue Apr 15, 2024 · 15 comments · Fixed by #193
Closed

RedisClient::ProtocolError: Unknown sigil type: "\r" #190

erikkessler1 opened this issue Apr 15, 2024 · 15 comments · Fixed by #193

Comments

@erikkessler1
Copy link

I have a pubsub connection that will will intermittently hit a RedisClient::ProtocolError:

RedisClient::ProtocolError: Unknown sigil type: "\r"

This seems to have started with just recently with version 0.22.0.

From patching the rescue RedisClient::RESP3::UnknownType in RedisClient::RubyConnection#read to print the current @io object and waiting for the error to happen, it looks like the @io object is the following state:

#<RedisClient::RubyConnection::BufferedIO:0x00007f0f842ed700
 @blocking_reads=false,
 @buffer=">3\r\n$7\r\nmessage\r\n$19\r\nnotify:entity_event\r\n$0\r\n\r\n",
 @chunk_size=4096,
 @offset=7,
 @read_timeout=5,
 @write_timeout=5>

I haven't been able to reproduce the error deterministically, unfortunately, but this is roughly what I've used to reproduce it in a production application:

pubsub.call(:subscribe, *channels)
p pubsub.next_event(1) while true
# wait for error
@byroot
Copy link
Member

byroot commented Apr 15, 2024

Thanks for the report, this is very likely caused by the recent changes in redis-client, I'll look into it tomorrow.

@casperisfine
Copy link
Collaborator

I forgot to ask if you could share the full backtrace.

@casperisfine
Copy link
Collaborator

I'd also need to know if you connect with a UNIX socket, regular TCP, or SSL.

casperisfine pushed a commit that referenced this issue Apr 16, 2024
Tentative fix for #190

I wasn't able to reproduce, not to really figure out the root cause.

`Unknown sigil type: "\r"` suggest we corrupted the offset after
calling `gets_integer`. When we read `\r` we assume `\n` is next
but don't check if we read it yet, and just increment the offset,
and the offset can potentially be past the buffer.

I don't know how it happens exactly, but somehow we sometimes don't
handle this case properly.

The fact that it was reported with a `pubsub` use case suggest it
might be after a read timeout, but can't be certain.
@casperisfine
Copy link
Collaborator

@erikkessler1 I tried the repro by crafting specific responses but couldn't figure it out. I still made a tentative fix in #191, I'd appreciate if you could try it in your app and report back whether it fixes the issue.

@erikkessler1
Copy link
Author

Unfortunately, it looks like it still happens with that fix. The connection is SSL and here is the backtrace:

gems/redis-client-0.22.0/lib/redis_client/ruby_connection/resp3.rb:141:in `parse'
gems/redis-client-0.22.0/lib/redis_client/ruby_connection/resp3.rb:50:in `load'
gems/redis-client-0.22.0/lib/redis_client/ruby_connection.rb:98:in `block in read'
gems/redis-client-0.22.0/lib/redis_client/ruby_connection/buffered_io.rb:114:in `with_timeout'
gems/redis-client-0.22.0/lib/redis_client/ruby_connection.rb:94:in `read`
gems/redis-client-0.22.0/lib/redis_client.rb:513:in `next_event'
<application-code>

This is the error with the patch and state of @io after the error:

#<RedisClient::RESP3::UnknownType: Unknown sigil type: "t">

#<RedisClient::RubyConnection::BufferedIO:0x00007f8a7e9abef0
 @blocking_reads=false,
 @buffer=">3\r\n$7\r\nmessage\r\n$25\r\nnotify:email_notification\r\n$0\r\n\r\n>3\r\n$7\r\nmessage\r\n$25\r\nnotify:email_notification\r\n$0\r\n\r\n",
 @chunk_size=4096,
 @io=#<OpenSSL::SSL::SSLSocket:0x00007f8a75405a68>,
 @offset=99,
 @read_timeout=5,
 @write_timeout=5>

@byroot
Copy link
Member

byroot commented Apr 16, 2024

Thank you. SSL socket changes a lot of things. Can you share your Ruby and openssl gem version, as well as OpenSSL::OPENSSL_LIBRARY_VERSION?

@casperisfine
Copy link
Collaborator

I think I got a repro using SSL:

/Users/byroot/src/github.com/redis-rb/redis-client/lib/redis_client/ruby_connection.rb:101:in `rescue in read': Unknown sigil type: "\n" (rediss://127.0.0.1:26379) (RedisClient::ProtocolError)
	from /Users/byroot/src/github.com/redis-rb/redis-client/lib/redis_client/ruby_connection.rb:94:in `read'
	from /Users/byroot/src/github.com/redis-rb/redis-client/lib/redis_client.rb:513:in `next_event'
	from test/sub.rb:29:in `block (2 levels) in <main>'
	from <internal:numeric>:237:in `times'
	from test/sub.rb:28:in `block in <main>'
	from <internal:kernel>:187:in `loop'
	from test/sub.rb:24:in `<main>'

@erikkessler1
Copy link
Author

Here are my OpenSSL versions:

OpenSSL::VERSION
#=> "3.2.0"
OpenSSL::OPENSSL_LIBRARY_VERSION
#=> "OpenSSL 1.1.1w  11 Sep 2023"

@casperisfine
Copy link
Collaborator

Alright, I figured the root cause.

With a regular socket read_nonblock(10, buffer, exception: false) doesn't clear the buffer if there's nothing to read.

But with SSLSocket, the buffer is always cleared. that's what cause the issue.

@erikkessler1
Copy link
Author

Awesome, great find! 🎉

casperisfine pushed a commit that referenced this issue Apr 16, 2024
Fix: #190

```ruby
buff = "blah".b
p io.read_nonblock(10, buffer, exception: false) # :wait_readable
p buff
```

The above code when using a regular Ruby IO socket leaves the
buffer intact, because it only replaces the content if it actually
read something.

However when using a `SSLSocket`, the buffer is always cleared
regardless of whether something was read or not.

This difference could cause the offset to be corrupted by pointing
forward.
@casperisfine
Copy link
Collaborator

@erikkessler1 could you try #193 ? I just want to make sure I fixed the same bug you are experiencing (we never know).

@erikkessler1
Copy link
Author

After running things with that patch for 15 minutes I haven't seen the error again (usually would have seen it every few minutes), so I'd say that was the issue. Thank you for the quick fix!

@byroot
Copy link
Member

byroot commented Apr 16, 2024

Welcome.

I need to fix a few things on CI and will try to write a test for this (tricky) and I'll cut a release soon.

casperisfine pushed a commit that referenced this issue Apr 16, 2024
Fix: #190

```ruby
buff = "blah".b
p io.read_nonblock(10, buffer, exception: false) # :wait_readable
p buff
```

The above code when using a regular Ruby IO socket leaves the
buffer intact, because it only replaces the content if it actually
read something.

However when using a `SSLSocket`, the buffer is always cleared
regardless of whether something was read or not.

This difference could cause the offset to be corrupted by pointing
forward.
@casperisfine
Copy link
Collaborator

0.22.1 is out.

@casperisfine
Copy link
Collaborator

Upstream PR: ruby/openssl#739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants