Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The SFTP replica type tries to close a handle twice, provokes an error from Twisted SFTP server #268

Closed
exarkun opened this issue Jan 21, 2022 · 3 comments · Fixed by #345
Assignees
Labels
bug Something isn't working
Milestone

Comments

@exarkun
Copy link

exarkun commented Jan 21, 2022

I tried pointing litestream at a Twisted SFTP server. It started off ok but it always triggers an unhandled exception in Twisted after uploading some data. This is probably a bug in Twisted but the behavior of litestream that triggers it seems like it is probably also wrong.

I instrumented the server to report what SFTP actions were being taken and got this:

INIT    data=b'\x00\x00\x00\x03'
OPENDIR data=b'\x00\x00\x00\x01\x00\x00\x000zkapauthz/generations/974ebb6cb1473185/snapshots'
READDIR data=b'\x00\x00\x00\x02\x00\x00\x00\r8782928208674'
READDIR data=b'\x00\x00\x00\x03\x00\x00\x00\r8782928208674'
CLOSE   data=b'\x00\x00\x00\x04\x00\x00\x00\r8782928208674'
OPENDIR data=b'\x00\x00\x00\x05\x00\x00\x000zkapauthz/generations/974ebb6cb1473185/snapshots'
READDIR data=b'\x00\x00\x00\x06\x00\x00\x00\r8782927425859'
READDIR data=b'\x00\x00\x00\x07\x00\x00\x00\r8782927425859'
CLOSE   data=b'\x00\x00\x00\x08\x00\x00\x00\r8782927425859'
OPENDIR data=b'\x00\x00\x00\t\x00\x00\x00*zkapauthz/generations/974ebb6cb1473185/wal'
STAT    data=b'\x00\x00\x00\n\x00\x00\x00*zkapauthz/generations/974ebb6cb1473185/wal'
STAT    data=b'\x00\x00\x00\x0b\x00\x00\x00&zkapauthz/generations/974ebb6cb1473185'
MKDIR   data=b'\x00\x00\x00\x0c\x00\x00\x00*zkapauthz/generations/974ebb6cb1473185/wal\x00\x00\x00\x00'
OPEN    data=b'\x00\x00\x00\r\x00\x00\x00Dzkapauthz/generations/974ebb6cb1473185/wal/00000001_00000000.wal.lz4\x00\x00\x00\x1a\x00\x00\x00\x00'
WRITE   data=b'\x00\x00\x00\x0e\x00\x00\x00\x0f140526888035664\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\x04"M\x18dp\xb9'
WRITE   data=b'\x00\x00\x00\x0f\x00\x00\x00\x0f140526888035664\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00\x04d\x00\x00\x00'
WRITE   data=b'\x00\x00\x00\x10\x00\x00\x00\x0f140526888035664\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00d\xf4\x197\x7f\x06\x8...
WRITE   data=b'\x00\x00\x00\x11\x00\x00\x00\x0f140526888035664\x00\x00\x00\x00\x00\x00\x00o\x00\x00\x00\x08\x00\x00\x00\x00\x92...
CLOSE   data=b'\x00\x00\x00\x12\x00\x00\x00\x0f140526888035664'
closing: b'\x00\x00\x00\x12' b'140526888035664'
CLOSE   data=b'\x00\x00\x00\x13\x00\x00\x00\x0f140526888035664'
closing: b'\x00\x00\x00\x13' b'140526888035664'

The data for a CLOSE operation is a 4 byte request id and then a length-prefixed string (4 byte prefix).

So litestream is sending a CLOSE with request id \x00\x00\x00\x12 and a handle of 140526888035664'.
Then it sends a CLOSE with a request id \x00\x00\x00\x13 and a handle of 140526888035664'. ie, a second close of the same handle.

According to the sftp draft spec CLOSE immediately invalidates a handle so the 2nd attempt seems like it could only be an error.

@benbjohnson benbjohnson self-assigned this Jan 23, 2022
@benbjohnson benbjohnson added the bug Something isn't working label Jan 23, 2022
@benbjohnson benbjohnson added this to the v0.4.0 milestone Jan 23, 2022
@benbjohnson
Copy link
Owner

@exarkun Thanks for the deep investigation. I use a double close approach with most closable things (e.g. file descriptors) and it's typically a no-op for the second close. Looks like the SFTP client doesn't behave that way so I'll need to add a wrapper around it to ensure the second close doesn't do anything.

"Double close" may sound like a mistake but the idea is that I call a defer close immediately after opening to ensure it closes no matter what and then I perform an explicit close when my code doesn't error so I can check the error state of the close. For example, here's a defer close here and then an explicit close here. The defer would ensure the connection is closed if the copy in between failed or if there was a panic before the second close.

I tagged this as a bug and I'll get it fixed for the v0.4.0 release.

@exarkun
Copy link
Author

exarkun commented Jan 25, 2022

Thanks @benbjohnson . The reasoning behind the double close makes sense to me. It seems like a reasonable approach under the assumption that the SFTP server will not behavior out-of-spec on the second close (which Twisted's did, but in future releases should not - twisted/twisted#1685).

One thing I'm not quite sure I understand is why the subsequent behavior seemed to be that litestream became unable to make more progress replicating the database in this situation. Is this because the Close calls block until the response is received (and therefore the replication thread ... if there is such a thing ... is likewise blocked)? If so then it makes sense - since Twisted never returned a response in this case. I raise this point just to be sure that the ENOENT response litestream will get from Twisted on the 2nd close won't be a problem.

@benbjohnson
Copy link
Owner

If you send a SIGQUIT signal to litestream when you think it's stuck then it should stop the program and dump out a stack trace. We can take a look at that and see if it's blocked on the Close().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants