Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload file hangs #502

Open
TimYi opened this issue Mar 14, 2022 · 5 comments
Open

Upload file hangs #502

TimYi opened this issue Mar 14, 2022 · 5 comments

Comments

@TimYi
Copy link

TimYi commented Mar 14, 2022

package version: github.com/pkg/sftp v1.13.4

Error: Manually cut off the network, the program hangs at the uploading file.

Expect: Detect network disconnection and return errors in a timely manner.

Code:

targetFile, err := client.Create(targetPath)
if err != nil {
	log.Printf("can't open target file, the error is:\n%v", err)
	return
}
defer targetFile.Close()
sourceFile, err := os.Open(filePath)
if err != nil {
	log.Printf("can't open source file, the error is:\n%v", err)
	return
}
defer sourceFile.Close()
_, err = io.Copy(targetFile, sourceFile)  // hangs here
if err != nil {
	log.Printf("can't copy file to the remote server, the error is:\n%v", err)
	return
}
sourceFile.Close()

This feature is very important for automatically monitoring the directory and uploading files continuously, otherwise a network interruption may cause the program hangs forever.

@drakkan
Copy link
Collaborator

drakkan commented Mar 16, 2022

Hi,

I did a quick test, after a very long time the network disconnection is detected

start 2022-03-16 16:27:50.780043475 +0100 CET m=+1.755726143
end, n 3440640, err connection lost, elapsed 17m16.004653336s

about 17 minutes in my test.

I think this error is detected in crypto/ssh, @puellanivis do you have any idea?

@puellanivis
Copy link
Collaborator

puellanivis commented Mar 16, 2022

🤔 I’m not sure how much we can really do here. Like you say, the network disconnect logic is not something we can control. Though, perhaps we could build a watchdog, but then the user could build a watchdog themselves, and then they could exert way better control on that.

If this is going into some sort of service, or long-lived program, you’ll probably want some sort of disconnect/reconnect logic anyways?

PS: If we do a v2 of the client API, then using contexts would make these timeout things much nicer.

@drakkan
Copy link
Collaborator

drakkan commented Mar 16, 2022

Hi,

I did a quick test, after a very long time the network disconnection is detected

start 2022-03-16 16:27:50.780043475 +0100 CET m=+1.755726143
end, n 3440640, err connection lost, elapsed 17m16.004653336s

about 17 minutes. I think this error is detected in crypto/ssh, @puellanivis do you have any idea?

thinking I’m not sure how much we can really do here. Like you say, the network disconnect logic is not something we can control. Though, perhaps we could build a watchdog, but then the user could build a watchdog themselves, and then they could exert way better control on that.

If this is going into some sort of service, or long-lived program, you’ll probably want some sort of disconnect/reconnect logic anyways?

PS: If we do a v2 of the client API, then using contexts would make these timeout things much nicer.

I don't think it will be that easy to interrupt an hanging read/write, but I could be wrong

@puellanivis
Copy link
Collaborator

If I were running into this issue myself, I would probably implement a work-around piecemeal Copy function that performs writes in a goroutine with the primary goroutine waiting on a select { case <-time.After(…): … ; case <-errCh: … } to get a poor-mans write deadline.

I’m not sure that we could fix anything on our end… if the Write hangs, that’s kind of all we got. I suppose it might be possible to plumb a WriteDeadline through… but actually, no. There’s too many abtraction layers between our Write and the network connection itself.

@drakkan
Copy link
Collaborator

drakkan commented Feb 22, 2023

If I were running into this issue myself, I would probably implement a work-around piecemeal Copy function that performs writes in a goroutine with the primary goroutine waiting on a select { case <-time.After(…): … ; case <-errCh: … } to get a poor-mans write deadline.

I’m not sure that we could fix anything on our end… if the Write hangs, that’s kind of all we got. I suppose it might be possible to plumb a WriteDeadline through… but actually, no. There’s too many abtraction layers between our Write and the network connection itself.

I agree, this can be easily fixed on the application side. I periodically issue a Getwd command and if I get no response after 30 seconds I close the connection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants