You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried pointing litestream at a Twisted SFTP server. It started off ok but it always triggers an unhandled exception in Twisted after uploading some data. This is probably a bug in Twisted but the behavior of litestream that triggers it seems like it is probably also wrong.
I instrumented the server to report what SFTP actions were being taken and got this:
INIT data=b'\x00\x00\x00\x03'
OPENDIR data=b'\x00\x00\x00\x01\x00\x00\x000zkapauthz/generations/974ebb6cb1473185/snapshots'
READDIR data=b'\x00\x00\x00\x02\x00\x00\x00\r8782928208674'
READDIR data=b'\x00\x00\x00\x03\x00\x00\x00\r8782928208674'
CLOSE data=b'\x00\x00\x00\x04\x00\x00\x00\r8782928208674'
OPENDIR data=b'\x00\x00\x00\x05\x00\x00\x000zkapauthz/generations/974ebb6cb1473185/snapshots'
READDIR data=b'\x00\x00\x00\x06\x00\x00\x00\r8782927425859'
READDIR data=b'\x00\x00\x00\x07\x00\x00\x00\r8782927425859'
CLOSE data=b'\x00\x00\x00\x08\x00\x00\x00\r8782927425859'
OPENDIR data=b'\x00\x00\x00\t\x00\x00\x00*zkapauthz/generations/974ebb6cb1473185/wal'
STAT data=b'\x00\x00\x00\n\x00\x00\x00*zkapauthz/generations/974ebb6cb1473185/wal'
STAT data=b'\x00\x00\x00\x0b\x00\x00\x00&zkapauthz/generations/974ebb6cb1473185'
MKDIR data=b'\x00\x00\x00\x0c\x00\x00\x00*zkapauthz/generations/974ebb6cb1473185/wal\x00\x00\x00\x00'
OPEN data=b'\x00\x00\x00\r\x00\x00\x00Dzkapauthz/generations/974ebb6cb1473185/wal/00000001_00000000.wal.lz4\x00\x00\x00\x1a\x00\x00\x00\x00'
WRITE data=b'\x00\x00\x00\x0e\x00\x00\x00\x0f140526888035664\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\x04"M\x18dp\xb9'
WRITE data=b'\x00\x00\x00\x0f\x00\x00\x00\x0f140526888035664\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00\x04d\x00\x00\x00'
WRITE data=b'\x00\x00\x00\x10\x00\x00\x00\x0f140526888035664\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00d\xf4\x197\x7f\x06\x8...
WRITE data=b'\x00\x00\x00\x11\x00\x00\x00\x0f140526888035664\x00\x00\x00\x00\x00\x00\x00o\x00\x00\x00\x08\x00\x00\x00\x00\x92...
CLOSE data=b'\x00\x00\x00\x12\x00\x00\x00\x0f140526888035664'
closing: b'\x00\x00\x00\x12' b'140526888035664'
CLOSE data=b'\x00\x00\x00\x13\x00\x00\x00\x0f140526888035664'
closing: b'\x00\x00\x00\x13' b'140526888035664'
The data for a CLOSE operation is a 4 byte request id and then a length-prefixed string (4 byte prefix).
So litestream is sending a CLOSE with request id \x00\x00\x00\x12 and a handle of 140526888035664'.
Then it sends a CLOSE with a request id \x00\x00\x00\x13 and a handle of 140526888035664'. ie, a second close of the same handle.
According to the sftp draft spec CLOSE immediately invalidates a handle so the 2nd attempt seems like it could only be an error.
The text was updated successfully, but these errors were encountered:
@exarkun Thanks for the deep investigation. I use a double close approach with most closable things (e.g. file descriptors) and it's typically a no-op for the second close. Looks like the SFTP client doesn't behave that way so I'll need to add a wrapper around it to ensure the second close doesn't do anything.
"Double close" may sound like a mistake but the idea is that I call a defer close immediately after opening to ensure it closes no matter what and then I perform an explicit close when my code doesn't error so I can check the error state of the close. For example, here's a defer close here and then an explicit close here. The defer would ensure the connection is closed if the copy in between failed or if there was a panic before the second close.
I tagged this as a bug and I'll get it fixed for the v0.4.0 release.
Thanks @benbjohnson . The reasoning behind the double close makes sense to me. It seems like a reasonable approach under the assumption that the SFTP server will not behavior out-of-spec on the second close (which Twisted's did, but in future releases should not - twisted/twisted#1685).
One thing I'm not quite sure I understand is why the subsequent behavior seemed to be that litestream became unable to make more progress replicating the database in this situation. Is this because the Close calls block until the response is received (and therefore the replication thread ... if there is such a thing ... is likewise blocked)? If so then it makes sense - since Twisted never returned a response in this case. I raise this point just to be sure that the ENOENT response litestream will get from Twisted on the 2nd close won't be a problem.
If you send a SIGQUIT signal to litestream when you think it's stuck then it should stop the program and dump out a stack trace. We can take a look at that and see if it's blocked on the Close().
I tried pointing litestream at a Twisted SFTP server. It started off ok but it always triggers an unhandled exception in Twisted after uploading some data. This is probably a bug in Twisted but the behavior of litestream that triggers it seems like it is probably also wrong.
I instrumented the server to report what SFTP actions were being taken and got this:
The data for a CLOSE operation is a 4 byte request id and then a length-prefixed string (4 byte prefix).
So litestream is sending a CLOSE with request id
\x00\x00\x00\x12
and a handle of140526888035664'
.Then it sends a CLOSE with a request id
\x00\x00\x00\x13
and a handle of140526888035664'
. ie, a second close of the same handle.According to the sftp draft spec CLOSE immediately invalidates a handle so the 2nd attempt seems like it could only be an error.
The text was updated successfully, but these errors were encountered: