Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_write_to_socket is changing input, which breaks write #962

Closed
SamSaffron opened this issue Nov 18, 2020 · 22 comments
Closed

_write_to_socket is changing input, which breaks write #962

SamSaffron opened this issue Nov 18, 2020 · 22 comments

Comments

@SamSaffron
Copy link
Contributor

SamSaffron commented Nov 18, 2020

This code is amending the input of the data

data.slice!(0, bytes_written)

And so is this code:

data = data.byteslice(count..-1)

Simplest fix here is simply to stop the slice! call in _write_to_socket

This is very urgent @byroot latest release is broken. Ideally we should have some sort of test to catch this.

End result is that if you are lucky, partial things are shipped to redis, if you are unlucky stuff explodes on:

NoMethodError: undefined method `bytesize' for nil:NilClass
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/connection/ruby.rb:103:in `block in write'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/connection/ruby.rb:102:in `loop'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/connection/ruby.rb:102:in `write'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/connection/ruby.rb:394:in `write'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:284:in `block in write'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:263:in `io'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:282:in `write'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:240:in `block (3 levels) in process'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:234:in `each'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:234:in `block (2 levels) in process'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:384:in `ensure_connected'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:233:in `block in process'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:320:in `logging'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:232:in `process'
  /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/redis-4.2.3/lib/redis/client.rb:126:in `call'
@byroot
Copy link
Collaborator

byroot commented Nov 18, 2020

Looks like a duplicate of #961, but with better info. I was planning to investigate this first thing in the morning.

But if you got a repro, I can try to squeeze a release fix tonight, but it's really late, not sure I'm sharp enough.

@SamSaffron
Copy link
Contributor Author

SamSaffron commented Nov 18, 2020

yikes, lets not super duper rush this, I will work on a PR today so you have something easy peasy to merge first thing in your morning!

It will take me a few hours to set this up... a failing test kind of relies on side effects and the speed of write_nonblock ...

@byroot
Copy link
Collaborator

byroot commented Nov 18, 2020

More generally I see how the mutation is a big mistake, there should be a .dup in there. But I don't however understand how we end up with a nil.

@SamSaffron
Copy link
Contributor Author

SamSaffron commented Nov 18, 2020

I totally get this "".byteslice(1..-1) returns nil. data ends up shorter than expected in the caller.

@byroot
Copy link
Collaborator

byroot commented Nov 18, 2020

Oh I think I get it now:

  • We start to write
  • get a partial write so we mutate the string
  • Somehow we get a connection error, or a reconnection.
  • The retry is done with a string too short which leads to that out of bound slice.

That's going to be tricky to reproduce, and worse to test, we'd need something like Toxiproxy.

@SamSaffron
Copy link
Contributor Author

Yeah, fixing is kind of easy, testing super hard... but I am willing to give it a bit of a shot... we could cheat with mocking but I really hate that.

Also... ideally we can amend this pipeline to not make string copies ... but it may be tricky.

@SamSaffron
Copy link
Contributor Author

SamSaffron commented Nov 19, 2020

There is also the discrepancy between slice! and byteslice in play, there may be an edge case there.

@byroot
Copy link
Collaborator

byroot commented Nov 19, 2020

between slice! and byteslice in play,

IIRC slice! is on string with BINARY encoding, so it should be safe. But maybe I messed that too.

@byroot
Copy link
Collaborator

byroot commented Nov 19, 2020

But maybe I messed that too.

Seems like I did...

We need to copy that buffer in write(), with data = data.b.

@byroot
Copy link
Collaborator

byroot commented Nov 19, 2020

it's fine if you cast the string to not be ASCII-8BIT aka BINARY:

>> "\x03\u3042\xff".b.slice(0,3).bytesize
=> 3

I think that's our fix. I'm almost tempted to release it now, but without a repro it's kind of risky.

@SamSaffron
Copy link
Contributor Author

SamSaffron commented Nov 19, 2020

I wonder if there is any way to get around with less copying, kind of wish there a write non block that takes offsets

I think under some conditions partial string copies can be really cheap ?

@byroot
Copy link
Collaborator

byroot commented Nov 19, 2020

I think under some conditions partial string copies can be really cheap ?

Not as far as I know. The only real optimization in strings is duping a frozen string, that creates a "shared string":

>> puts ObjectSpace.dump(s.dup)
{"address":"0x7fc8d3a70410", "type":"STRING", "class":"0x7fc8d40c7598", "shared":true, "encoding":"UTF-8", "references":["0x7fc8d4180728"], "memsize":40, "flags":{"wb_protected":true}}

But a slice is always a fully new string:

>> puts ObjectSpace.dump(s[0..40])
{"address":"0x7fc8d3abdc10", "type":"STRING", "class":"0x7fc8d40c7598", "bytesize":41, "value":"foobarbazfoobarbazfoobarbazfoobarbazfooba", "encoding":"UTF-8", "memsize":82, "flags":{"wb_protected":true}}

Anyway, right now we mostly want a quick fix, we can see if there's some more performance to squeeze. But AFAIK using a mutable binary string as a buffer is the fastest method.

@SamSaffron
Copy link
Contributor Author

sure I support this, I can give a shot at a repro here. But I agree from everything I can tell about this that a simple data = data.b at the start of write_to_socket should sort this out.

byroot added a commit that referenced this issue Nov 19, 2020
@byroot
Copy link
Collaborator

byroot commented Nov 19, 2020

I just pushed cbcb700, I'm very tempted to release it. How worse can it be?

@SamSaffron
Copy link
Contributor Author

I support releasing it, I am confident it will improve things.

@byroot
Copy link
Collaborator

byroot commented Nov 19, 2020

Done. I pushed a 4.2.4 with the fix. I'll try to write a proper test case tomorrow. I'll leave this issue open for now. I'll wait for confirmation that this actually fixed the issue.

@SamSaffron
Copy link
Contributor Author

I made this PR that adds a test and moves stuff around a tiny bit

#963

Up to you if you want to merge or rework it.

@casperisfine
Copy link

So I wrote the following benchmark:

require 'benchmark/ips'

class FakeIO
  def write_nonblock(buffer, exception: false)
    raise 'wtf' if buffer.empty?
    [buffer.bytesize, 65_527].min # popular socket buffer size
  end
end

class CurrentBufferedWriter
  def initialize(io)
    @io = io
  end

  def _write_to_socket(data)
    total_bytes_written = 0
    loop do
      case bytes_written = @io.write_nonblock(data, exception: false)
      when :wait_readable
        unless wait_readable(@write_timeout)
          raise Redis::TimeoutError
        end
      when :wait_writable
        unless wait_writable(@write_timeout)
          raise Redis::TimeoutError
        end
      when nil
        raise Errno::ECONNRESET
      when Integer
        total_bytes_written += bytes_written
        if bytes_written < data.bytesize
          data.slice!(0, bytes_written)
        else
          return total_bytes_written
        end
      end
    end
  end

  def write(data)
    data = data.b
    length = data.bytesize
    total_count = 0
    loop do
      count = _write_to_socket(data)

      total_count += count
      return total_count if total_count >= length

      data = data.byteslice(count..-1)
    end
  end
end

class MutatingBufferedWriter
  def initialize(io)
    @io = io
  end

  def write(data)
    buffer = data.b
    length = data.bytesize
    total_bytes_written = 0
    loop do
      case bytes_written = @io.write_nonblock(buffer, exception: false)
      when :wait_readable
        unless wait_readable(@write_timeout)
          raise Redis::TimeoutError
        end
      when :wait_writable
        unless wait_writable(@write_timeout)
          raise Redis::TimeoutError
        end
      when nil
        raise Errno::ECONNRESET
      when Integer
        total_bytes_written += bytes_written
        return total_bytes_written if total_bytes_written >= length
        buffer.slice!(0, bytes_written)
      end
    end
  end
end


class CopyingBufferedWriter
  def initialize(io)
    @io = io
  end

  def write(data)
    length = data.bytesize
    total_bytes_written = 0
    loop do
      case bytes_written = @io.write_nonblock(data, exception: false)
      when :wait_readable
        unless wait_readable(@write_timeout)
          raise Redis::TimeoutError
        end
      when :wait_writable
        unless wait_writable(@write_timeout)
          raise Redis::TimeoutError
        end
      when nil
        raise Errno::ECONNRESET
      when Integer
        total_bytes_written += bytes_written
        return total_bytes_written if total_bytes_written >= length
        data = data.byteslice(bytes_written..-1)
      end
    end
  end
end

current = CurrentBufferedWriter.new(FakeIO.new)
mutating = MutatingBufferedWriter.new(FakeIO.new)
copying = CopyingBufferedWriter.new(FakeIO.new)

PAYLOAD = ("\u3042" * 32_000).freeze

Benchmark.ips do |x|
  x.config(:time => 10, :warmup => 2)

  x.report('current') { current.write(PAYLOAD) }
  x.report('mutating') { mutating.write(PAYLOAD) }
  x.report('copying') { copying.write(PAYLOAD) }

  x.compare!
end

On 2.7.1 it get:

Warming up --------------------------------------
             current     1.335k i/100ms
            mutating     1.384k i/100ms
             copying   108.792k i/100ms
Calculating -------------------------------------
             current     14.313k (± 6.5%) i/s -    142.845k in  10.023401s
            mutating     14.307k (± 6.4%) i/s -    142.552k in  10.006148s
             copying      1.101M (± 1.2%) i/s -     11.097M in  10.082546s

Comparison:
             copying:  1100764.6 i/s
             current:    14313.2 i/s - 76.91x  (± 0.00) slower
            mutating:    14306.6 i/s - 76.94x  (± 0.00) slower

The difference is so big that I can hardly believe it.

@casperisfine
Copy link

Holy cow, turns out byteslize can create shared strings:

>> PAYLOAD = ("\u3042" * 32_000)
>> puts ObjectSpace.dump(PAYLOAD.byteslice(4096..-1))
{"address":"0x7fcb760f84c8", "type":"STRING", "class":"0x7fcb728b7578", "broken":true, "shared":true, "encoding":"UTF-8", "references":["0x7fcb7297ea10"], "memsize":40, "flags":{"wb_protected":true}}

Ok, let's go with that.

@SamSaffron
Copy link
Contributor Author

@casperisfine wow the copying is so cool, I did not know about byteslice being so efficient!

@ioquatix
Copy link
Contributor

This is an incredibly tricky area of Ruby I/O and has some rough edge cases.

There is an issue discussing introducing byteslice! and I think ultimately you need an internal buffer to do it efficiently, so that if truncation occurs you copy the remainder to a buffer and then flush that. It depends on your use case.

https://bugs.ruby-lang.org/issues/13626

Maybe we should introduce optional timeouts for read/write and other blocking operations.

@byroot
Copy link
Collaborator

byroot commented Nov 20, 2020

I think we can close this now. Thanks all!

@byroot byroot closed this as completed Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants