Incomplete downloads #424

mrwonko · 2021-03-29T08:31:08Z

Since upgrading to 1.13 the library will often report EOF before a file opened with Client.Open() is downloaded completely. For example a 634618 byte file has stopped downloading after just 65536 bytes, and a 2367599 byte file after 2031616.

As a workaround I've added a call to File.Stat().Size(), which returns the correct file size, and compare that with the downloaded size to check for success.

The text was updated successfully, but these errors were encountered:

puellanivis · 2021-03-29T08:43:33Z

Queries: Is it returning an io.EOF error, or a nil error? How are you transferring the file? Client.Open() then a io.Copy?

mrwonko · 2021-03-29T09:27:49Z

Yes, Client.Open() followed by io.Copy(), which returns a nil error, implying that the reader itself returned io.EOF.

puellanivis · 2021-03-29T15:42:58Z

Hm… this is proving pretty hard to pin down what might be going wrong with WriteTo. I see one path that could result in an err == nil case without an io.EOF but no one closes any of those channels anyways, so it seems unlikely to be that, and I don’t really see how we could be getting an io.EOF from anything other than an SSH_FXP_STATUS…

Clearly some number of following bytes are not being committed, this can be clearly seen, because the two mistaken file size examples are at 32768 byte boundaries…

mrwonko · 2021-03-29T15:53:15Z

the two mistaken file size examples are at 32768 byte boundaries

Sounds right, other examples like 1343488, 1638400 and 786432 also fulfil that.

puellanivis · 2021-03-29T22:42:49Z

I’m thinking we might want to try a cherry pick that covers the unexpected channel close case. I’m entirely unsure how that could ever trigger, but it might be the problem, as that is the only other way we can get a nil error without an EOF, and like, it very clearly shouldn’t be a nil error… it was just like that as a hold-over from an earlier draft.

This could also explain the observed behavior, some number of 32 KiB writes work, then we receive on a closed channel, and stop all transfers.

drakkan · 2021-03-30T06:35:14Z

@puellanivis thank you. So if I want try to replicate this issue it should happen only for unexpected transfer errors, right?

puellanivis · 2021-03-30T07:01:45Z

@drakkan That’s the thing, this really shouldn’t happen for unexpected transfer errors either. Those should all percolate some non-EOF error up the line as well? 🤔

drakkan · 2021-04-02T11:46:35Z

I tryed to replicate this issue with no luck using something like this:

  for i := 0; i < 1000; i++ {
		start := time.Now()
		f, err := sftpClient.Open("file.zip")
		if err != nil {
			panic(err)
		}
		dst, err := os.Create("/tmp/file.zip")
		if err != nil {
			panic(err)
		}
		n, err := io.Copy(dst, f)
		fmt.Printf("it %v written %v, err %v elapsed %v\n", i, n, err, time.Since(start))
		if n != 61449068 {
			fmt.Printf("errrrrrrrrrrrrrrrrrrrrrr\n\n\n\n")
		}
		f.Close()
		dst.Close()
	}

@mrwonko, can you please confirm that #425 fixes this issue in your case?

@puellanivis I tryed this over a VPN connection, to trigger the error I tryed to disconnect the VPN, in this case more than 15 minutes are needed before the connection lost error is triggered

it 0 written 2654208, err connection lost elapsed 17m10.208717111s

what do you think about? Should we add a context or such? thank you

mrwonko · 2021-04-07T11:33:55Z

I will give #425 a try next week and let you know if it helps.

mrwonko · 2021-04-12T15:40:04Z

I've deployed #425 on our preproduction system a couple of hours ago and there was already the first partial download, 98304 bytes instead of 240430. I'll keep it there until tomorrow to see if it happens again, but then I'll probably roll back, it doesn't seem to work.

mrwonko · 2021-04-13T07:16:30Z

There have been about a dozen failed downloads at this point, #425 doesn't seem to fix our issue.

drakkan · 2021-04-13T07:24:31Z

@mrwonko can you please provide a reproducer? I tryed the above code with no luck.

I'm not very familiar with client side code but if I have an easy way to reproduce the issue I'll try to fix it, thank you

puellanivis · 2021-04-13T09:41:26Z

The only think I can think of now is the possibility that there are writes and reads happening to the remote file at the same time. So, when the read tries to read a value at offset XYZ, it’s met with an io.EOF and so it assumes the read is complete. Since the client can really only receive either data, or an io.EOF that means that setup will only ever write a 32k. 🤔 This still seems so weird though.

puellanivis · 2021-04-13T09:54:33Z

Going through with a fine-toothed comb again, I’m seeing numerous small issues here and there. I’ll be making a PR to address them, and ensure that sendPacket does not ever return an io.EOF error, and the readChunkAt refuses to recognize an err == io.EOF from sendPacket as well.

Since proper EOFs in the SFTP library should only ever come via an SSH_FX_EOF status packet, and never raw from the sendPacket.

drakkan · 2021-04-13T13:31:22Z

@puellanivis thank for your patch

@mrwonko can you please a give a try to #429?

mrwonko · 2021-04-13T14:14:31Z

I'll give that new patch a try.

I've not yet been able to reproduce this locally due to unrelated issues, so instead I'm testing using our normal downloaders on our preproduction cluster, which reliably reproduces it. That said, the snippet you posted should theoretically be sufficient for a reproduction.

mrwonko · 2021-04-14T09:19:22Z

I still get partial downloads with #429.

puellanivis · 2021-04-14T09:23:05Z

I still get partial downloads with #429.

🙃 wth?

puellanivis · 2021-04-22T21:23:47Z

What are the architectures and programs running on both sides of this? Like, clearly the client in this situation is from our library, but what kind of server is it connecting to? Is that server running off of our package as well?

puellanivis · 2021-04-23T08:58:47Z

I know this might be a long shot, but do you still get incomplete downloads with #435 ?

mrwonko · 2021-04-23T11:23:16Z

Our client runs in an x86_64 docker container. All I know about the server is that it runs on Windows, and I'm pretty sure it doesn't use this library.

Since we don't run in 32 bit, I see no point in trying #435?

puellanivis · 2021-04-23T12:12:46Z

So, while the problem was manifest in 32-bit, but there were still other possible edge conditions that could have manifest the same behavior. I’m honestly not sure and I’m grasping at straws here.

carsten-luckmann · 2021-04-23T13:05:49Z

As far as I understand, the server is running JSCAPE and accesses the files it serves via some network file system.

drakkan · 2021-04-24T07:36:46Z

@carsten-luckmann @mrwonko we tried to reproduce the issue with no luck, you have two choices:

use v1.12 forever and/or hope that a refactoring of the library will also fix this regression
provide more information, a small reproducer would be ideal but also adding some debug logs in pkg/sftp, printing, for example, the concurrency value etc. for the non-working case could help us

Thank you!

j0hnsmith · 2021-04-26T10:34:12Z

I might be seeing this problem too. Using v1.13.0 we're doing

fp, err := sftpClient.Open("somepath")
...
data, err := ioutil.ReadAll(fp)
...

There are no errors (so I assume io.EOF was seen) but the data is short, in our case for 74kb and 69kb files the exact size written is 65024 bytes. Unfortunately the server deletes the files as soon as they're read to completion (the server did delete the files which suggests data was read but not returned fully) so we can't try again with different branches/PRs. Given the size is different to those reported above and io.Copy() wasn't used I'm not sure if it's the same problem or not.

puellanivis · 2021-04-26T10:58:47Z

😱 I think I might see what’s going on with your situation @j0hnsmith . While using concurrent reads, if we request the end-of-file read, the end server might trigger the “download is complete” functionality, and delete the file from its side and maybe even Truncate it to 0 length.

This would mean any later reading of the file at any point in the file would end up with an EOF since it is now beyond the length of the file. Causing a a premature and silent EOF to happen as some odd 32k boundary less than the length of the whole file.

I had considered needing to have the writes well-ordered, but I had not anticipated a need to ensure monotonously increasing read offsets in WriteTo.

puellanivis · 2021-04-26T11:04:35Z

🤔 I’m thinking about if I should work on a patch to get well-ordered monotonously increasing reads into WriteTo first before the other PR with the big rewrite to use the internal filexfer package.

That other PR doesn’t really touch any of the WriteTo or ReadFrom code, so I think it should be pretty safe, even if it means the other PR will need some adjustment after the fix is merged. I’ll try to get something out soon. I think there is good cause to prioritize this over the refactor right now, as it is causing serious issues.

puellanivis · 2021-04-26T17:51:38Z

#436 can people try this PR? And see if it fixes the issue?

drakkan · 2021-06-12T08:42:07Z

hopefully, fixed in v1.13.1

Coffeeri · 2024-01-10T15:50:29Z

#436 can people try this PR? And see if it fixes the issue?

This seems to be still the case in v1.13.6.
Some files are unexpected and silent cutoff when using io.ReadAll.
However as earlier noted, using io.Copy solves this.

fp, err := sftpClient.Open("somepath")
...
bs, err := io.ReadAll(fp)
// vs
var buf bytes.Buffer
_, err = io.Copy(&buf, fp)

puellanivis · 2024-01-10T17:49:25Z

Hm… trippy. So, io.ReadAll only uses Read from the io.Reader. And looking at the code, it seems like we’ve made the assumption that a short read means err = io.EOF, but this might coincide with a short packet that failed to read the whole way.

We’ll probably want to do some sort of check that if n != l { err = io.UnexpectedEOF } else { err = io.EOF } 🤔

puellanivis · 2024-01-11T16:30:00Z

Actually reopening this issue.

Coffeeri · 2024-01-17T13:02:09Z

And looking at the code, it seems like we’ve made the assumption that a short read means err = io.EOF, but this might coincide with a short packet that failed to read the whole way.

I tried to break on err = io.EOF, however it is never reaching that point.

The EOF "error" is generated by err = normaliseError(unmarshalStatus(packet.id, s.data)) . I reach this point four times, does this even make sense? Should not only one request reach the EOF?

puellanivis · 2024-01-18T18:29:25Z

If the io.EOF error is coming from line 1155, then that is a proper EOF, according to the SFTP server.

drakkan mentioned this issue Apr 20, 2021

Encoding SSH filexfer #430

Merged

drakkan closed this as completed Jun 12, 2021

uncleDecart mentioned this issue Dec 11, 2023

Connection lost when using library with limited bandwidth #566

Closed

puellanivis reopened this Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete downloads #424

Incomplete downloads #424

mrwonko commented Mar 29, 2021

puellanivis commented Mar 29, 2021

mrwonko commented Mar 29, 2021

puellanivis commented Mar 29, 2021

mrwonko commented Mar 29, 2021

puellanivis commented Mar 29, 2021

drakkan commented Mar 30, 2021

puellanivis commented Mar 30, 2021

drakkan commented Apr 2, 2021 •

edited

mrwonko commented Apr 7, 2021

mrwonko commented Apr 12, 2021

mrwonko commented Apr 13, 2021

drakkan commented Apr 13, 2021

puellanivis commented Apr 13, 2021

puellanivis commented Apr 13, 2021

drakkan commented Apr 13, 2021

mrwonko commented Apr 13, 2021 •

edited

mrwonko commented Apr 14, 2021

puellanivis commented Apr 14, 2021

puellanivis commented Apr 22, 2021

puellanivis commented Apr 23, 2021

mrwonko commented Apr 23, 2021

puellanivis commented Apr 23, 2021

carsten-luckmann commented Apr 23, 2021

drakkan commented Apr 24, 2021

j0hnsmith commented Apr 26, 2021

puellanivis commented Apr 26, 2021

puellanivis commented Apr 26, 2021

puellanivis commented Apr 26, 2021

drakkan commented Jun 12, 2021

Coffeeri commented Jan 10, 2024 •

edited

puellanivis commented Jan 10, 2024

puellanivis commented Jan 11, 2024

Coffeeri commented Jan 17, 2024 •

edited

puellanivis commented Jan 18, 2024

Incomplete downloads #424

Incomplete downloads #424

Comments

mrwonko commented Mar 29, 2021

puellanivis commented Mar 29, 2021

mrwonko commented Mar 29, 2021

puellanivis commented Mar 29, 2021

mrwonko commented Mar 29, 2021

puellanivis commented Mar 29, 2021

drakkan commented Mar 30, 2021

puellanivis commented Mar 30, 2021

drakkan commented Apr 2, 2021 • edited

mrwonko commented Apr 7, 2021

mrwonko commented Apr 12, 2021

mrwonko commented Apr 13, 2021

drakkan commented Apr 13, 2021

puellanivis commented Apr 13, 2021

puellanivis commented Apr 13, 2021

drakkan commented Apr 13, 2021

mrwonko commented Apr 13, 2021 • edited

mrwonko commented Apr 14, 2021

puellanivis commented Apr 14, 2021

puellanivis commented Apr 22, 2021

puellanivis commented Apr 23, 2021

mrwonko commented Apr 23, 2021

puellanivis commented Apr 23, 2021

carsten-luckmann commented Apr 23, 2021

drakkan commented Apr 24, 2021

j0hnsmith commented Apr 26, 2021

puellanivis commented Apr 26, 2021

puellanivis commented Apr 26, 2021

puellanivis commented Apr 26, 2021

drakkan commented Jun 12, 2021

Coffeeri commented Jan 10, 2024 • edited

puellanivis commented Jan 10, 2024

puellanivis commented Jan 11, 2024

Coffeeri commented Jan 17, 2024 • edited

puellanivis commented Jan 18, 2024

drakkan commented Apr 2, 2021 •

edited

mrwonko commented Apr 13, 2021 •

edited

Coffeeri commented Jan 10, 2024 •

edited

Coffeeri commented Jan 17, 2024 •

edited