Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file uploads are very slow compared to java jsch library #472

Closed
karthikrap opened this issue Sep 29, 2021 · 8 comments
Closed

file uploads are very slow compared to java jsch library #472

karthikrap opened this issue Sep 29, 2021 · 8 comments

Comments

@karthikrap
Copy link

Upload of 50mb file using sftp module took 237s while java jsch takes only 24s. My observation is most of the time is going in writing data to remote file though stream. I tried different ways of writing to stream (which i kept in comments in main.go) but none improved performance.

Attaching Java and Go Samples with source code (src.zip) and binaries(bin.zip). Instructions on how to execute these binaries is documented in README.txt in bin.zip.
bin.zip
src.zip

@drakkan
Copy link
Collaborator

drakkan commented Sep 29, 2021

Hi,

please try enabling ConcurrentWrites

@karthikrap
Copy link
Author

karthikrap commented Sep 29, 2021

I tried upload with "ConcurrentWrites" now upload time is almost same as java jsch. I have few questions

  1. If i use "ConcurrentWrites", Upload time are not consistent like it is in java jsch. On average 50mb upload taking 24s but some times it takes 60s. while java jsch takes at max 30 sec.

  2. In doc, for "ConcurrentWrites" it says "A write to a later offset in a file after an error, could end up with a file length longer than what was successfully written.". If i force terminate upload, partially uploaded file has few KBs of corrupted data at the end. Does this mean i cannot use it to implement resumable uploads? In what cases i should not use "ConcurrentWrites"?

  3. Java jsch library writes data sequentially which i saw in their source code. In go's ssh module, are reads & writes to ssh stream slower or why exactly is writing to stream without "ConcurrentWrites" is slower?

@drakkan
Copy link
Collaborator

drakkan commented Sep 30, 2021

I tried upload with "ConcurrentWrites" now upload time is almost same as java jsch. I have few questions

  1. If i use "ConcurrentWrites", Upload time are not consistent like it is in java jsch. On average 50mb upload taking 24s but some times it takes 60s. while java jsch takes at max 30 sec.

it could be related to the concurrency, for example take a look here, try to understand in your use case which concurrency value is used, you can change this value sending buffers of different sizes

  1. In doc, for "ConcurrentWrites" it says "A write to a later offset in a file after an error, could end up with a file length longer than what was successfully written.". If i force terminate upload, partially uploaded file has few KBs of corrupted data at the end. Does this mean i cannot use it to implement resumable uploads? In what cases i should not use "ConcurrentWrites"?

Concurrent writes are not safe if you want to implement resumable uploads, if you application crashes and so you cannot truncate the file to the uploaded size a successive resume will produce an inconsistent file

  1. Java jsch library writes data sequentially which i saw in their source code. In go's ssh module, are reads & writes to ssh stream slower or why exactly is writing to stream without "ConcurrentWrites" is slower?

Concurrent writes help on high latency networks (on local networks there is no difference). I haven't looked at java jsch code but they should use some form of concurrency to achieve that performance

@karthikrap
Copy link
Author

karthikrap commented Sep 30, 2021

@drakkan Thanks for your quick response it helped me a lot.
One thing is java jsch uploads are sequential. If i forcefully terminate upload, partially uploaded file is not corrupted.

@puellanivis
Copy link
Collaborator

Your jsch is almost certainly not doing only sequential uploads, it is doing concurrent uploads. (This is evident by the speed of the upload.) To get the speeds you’re seeing, it must be transferring data in parallel ahead of the actual request.

There are some new considerations of how to process sftp requests that could make ConcurrentWrites more safe, but this would be for a future implementation of the client.

We took a highly safe approach here on concurrency that ensured safety over potential issues.

@karthikrap
Copy link
Author

Thanks @puellanivis, if concurrentWrites does not corrupt file when uploads are interrupted it will be helpful.

@puellanivis
Copy link
Collaborator

puellanivis commented Oct 4, 2021

Unfortunately, we’re not able to ensure that concurrentWrites would not corrupt the file in an error situation. I think a few changes are being made (sequential request ordering) that would make it safer, but I’d have to look at things afresh.

@drakkan
Copy link
Collaborator

drakkan commented Feb 1, 2022

Thanks @puellanivis, if concurrentWrites does not corrupt file when uploads are interrupted it will be helpful.

This was fixed in #482, please reopen if it still does not work as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants