Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in BufferedIoOutputStream #263

Closed
tomaswolf opened this issue Nov 2, 2022 · 0 comments · Fixed by #264
Closed

Race condition in BufferedIoOutputStream #263

tomaswolf opened this issue Nov 2, 2022 · 0 comments · Fixed by #264
Assignees
Labels
bug An issue describing a bug in the code
Milestone

Comments

@tomaswolf
Copy link
Member

tomaswolf commented Nov 2, 2022

SFTP tests are flaky. Turns out that they are running into

23:13:59.123 | ERROR | MinaProcessor-25 | o.a.s.c.c.BufferedIoOutputStream | org.apache.sshd.common.channel.BufferedIoOutputStream             219 | finishWrite(sftp-out@0)[ChannelAsyncOutputStream[ChannelSession[id=0, recipient=0]-ServerSessionImpl[testTransferIntegrityWithBufferLargerThanPacket[REKEY_BLOCK_SIZE 65,536]@/127.0.0.1:59555]] cmd=SSH_MSG_CHANNEL_DATA] - pending byte counts underflow (-64525) after 18070131 bytes
23:13:59.123 | ERROR | m-42976-thread-1 | o.a.s.s.s.SftpSubsystem          | org.apache.sshd.common.util.logging.LoggingUtils                  693 | run(ServerSessionImpl[testTransferIntegrityWithBufferLargerThanPacket[REKEY_BLOCK_SIZE 65,536]@/127.0.0.1:59555]) SshChannelBufferedOutputException caught in SFTP subsystem after 863 buffers: Pending byte counts underflow
23:23:59.498 | INFO  | )-timer-thread-1 | o.a.s.s.s.ServerSessionImpl      | org.apache.sshd.common.session.helpers.SessionHelper             1183 | Disconnecting(ServerSessionImpl[testTransferIntegrityWithBufferLargerThanPacket[REKEY_BLOCK_SIZE 65,536]@/127.0.0.1:59555]): SSH2_DISCONNECT_PROTOCOL_ERROR - Detected IdleTimeout after 600375/600000 ms.

Finished org.apache.sshd.sftp.client.SftpTransferTest:testTransferIntegrityWithBufferLargerThanPacket[REKEY_BLOCK_SIZE 65,536] in 610192 ms

Let's ignore the fact that test hangs for 10 minutes until that timeout expires; that a different minor problem. The real problem is the "pending byte counts underflow", which indicates that the BufferedIoOutputStream somehow managed to write more data than was passed to it.

This is a race condition between BufferedIoOutputStream.startWriting() and BufferedIoOutputStream.finishWrite(). The following sequence of events is possible when a write that was in progress just finished and the user of the stream just initiates a new write:

  1. Thread 1: startWriting gets the head of the write queue -- this is still the future/buffer that was just written
  2. Thread 2: finishWrite removes that write from the write queue
  3. Thread 2: finishWrite clears the "current" write
  4. Thread 1: startWriting sets the "current" write and happily writes the future/buffer again.

Combined with the fact that the underlying ChannelAsyncOutputStream may in some cases copy the buffer and actually write that copy, but doesn't consume the original buffer in that case, this leads to the same data being written again, and when that second write finishes, this byte count gets subtracted again from the number of total bytes to be written. So the count is added once but subtracted twice, and eventually this "pending byte counts underflow" is hit.

This race condition in BufferedIoOutputStream needs to be fixed. Optionally ChannelAsyncOutputStream should consume the original buffer if it makes a copy; though I'm undecided on that: if it did, it might have hidden this bug (because then the second write of that buffer would have been a zero-byte write).

(The problem is not specific to the MINA transport back-end; it also occurs with NIO2 from time to time. CI builds are green because the race condition appears to be hit infrequently, and on a re-run the test usually succeeds.)

@tomaswolf tomaswolf added the bug An issue describing a bug in the code label Nov 2, 2022
@tomaswolf tomaswolf self-assigned this Nov 2, 2022
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Nov 2, 2022
Don't try to write a future that's already done in startWrite(). Just
skip it and try the next one, if any.

Fixes apache#263.

Bug: apache#263
@tomaswolf tomaswolf added this to the 2.9.2 milestone Nov 2, 2022
tomaswolf added a commit that referenced this issue Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue describing a bug in the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant