Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop ReadFromWithConcurrency submitting out of order writes #480

Closed
wants to merge 1 commit into from

Conversation

ncw
Copy link
Contributor

@ncw ncw commented Nov 11, 2021

This is for discussion. This is causing a real problem in rclone rclone/rclone#5806 but there may be a better way of fixing.


Before this change, ReadFromWithConcurrency would very often submit writes to the server out of order due to small timing differences when lots of blocks were queued and ready to send.

This causes more work for the server as seeking about the file isn't free. Writing chunks of a file out of order can also cause fragmentation and consequently loss of performance on that file.

This patch causes the write to the channel to be handled synchronously but the acks are received asynchronously allowing for many write packets to be in flight as before.

Before this change, ReadFromWithConcurrency would very often submit
writes to the server out of order due to small timing differences when
lots of blocks were queued and ready to send.

This causes more work for the server as seeking about the file isn't
free. Writing chunks of a file out of order can also cause
fragmentation and consequently loss of performance on that file.

This patch causes the write to the channel to be handled synchronously
but the acks are received asynchronously allowing for many write
packets to be in flight as before.
@ncw
Copy link
Contributor Author

ncw commented Nov 11, 2021

I speed tested this over a long wide pipe and I couldn't see any difference in speed before and after the patch, both about 40 MiB/s falling all the way back to 900KiB/s with concurrent writes disabled as you'd expect.

@ncw
Copy link
Contributor Author

ncw commented Nov 12, 2021

Thinking about this some more

  • this idea could be used in other reading and writing parts of the library
  • can replies to sftp messages come back out of order? If not then starting concurrency go routines isn't necessary, only one is needed

@drakkan
Copy link
Collaborator

drakkan commented Nov 14, 2021

Hi,

thanks for the contribution, it seems fine to me, however this code was written by @puellanivis, at the moment she has some personal issues, please wait some time for her to have a chance to have a look to the code, thanks for your patience

@puellanivis
Copy link
Collaborator

Indeed, sequential ordering of requests is probably a good idea, and I think I was writing it into the client updates.

There are some things I think that should be addressed, but as @drakkan mentioned, I'm currently out. (In hospital) and haven't been able to really do anything yet. This also happened as I was moving into a new apartment, and now missed the appointment for connecting my internet. 😫

@ncw
Copy link
Contributor Author

ncw commented Nov 14, 2021

Thanks for taking a look and happy to make changes when you are up to it. No hurry though and look after yourself.

@puellanivis
Copy link
Collaborator

Oh, I did want to mention, I did similar work already in PR 443, it might help to review that PR, and compare to what you're doing here. That'll end up being one of the first things I do myself.

@ncw
Copy link
Contributor Author

ncw commented Nov 24, 2021

Any thoughts on this PR?

Copy link
Collaborator

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had recommended reviewing PR-443 and paralleling what was done there, as it would be the first thing I would do. Doing so helped me spot a bug in the existing code, that we can fix as part of this PR.

Also, a number of little “trivial” things could have been caught ahead of review that would nevertheless would make the code inconsistent between two highly similar tasks.

@@ -1424,31 +1424,44 @@ func (f *File) Write(b []byte) (int, error) {
return n, err
}

func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) {
typ, data, err := f.c.sendPacket(ch, &sshFxpWritePacket{
func (f *File) startWriteChunkAt(ch chan result, b []byte, off int64) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function no longer provides any further functionality than just calling the f.c.dispatchRequest itself. Consider how in PR-443, I inlined the calls to dispatchRequest primarily because I also needed to get the ID used, in order to pass it into the later unmarshalStatus.

🤔 Actually, the existing code has a bug, that lead you astray here. See the later comment.

return nil
}

func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure I see the benefits of reimplementing writeChunkAt in terms of the new functions defined. Almost all of this is still already handled by sendPacket.

And, the existing writeChunkAt does not return len(b) if err != nil. This new implementation would return len(b) regardless of if there were an error or not.

This is one of the main reasons I didn’t bother refactoring readChunkAt in PR-443. Refactoring comes with risks, and it’s easy to introduce bugs accidentally. I prefer better to give a new path, and when all references to the old path are gone, then you can remove it safely. Until then, who knows what functionality people are accidentally depending on…

@@ -1628,9 +1642,11 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64
n, err := r.Read(b)
if n > 0 {
read += int64(n)
ch := make(chan result, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use newResChanPool(concurrency) as shown in PR-443, to avoid unnecessarily allocation of new channels for every write, rather than reusing ones that have already been received on. (The chan result used by dispatchRequest is a “guaranteed one receive, and only one receive” design, which allows us pool them.)

@@ -1628,9 +1642,11 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64
n, err := r.Read(b)
if n > 0 {
read += int64(n)
ch := make(chan result, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ch is a bit too generic of a name. There a number of channels already in scope here, and using “ch” suggests that this is The Channel—the singular one available in scope. Since this is one of many relevant channels in scope, a more specific name is desirable. Since this is the channel we will be receiving our results on, using res is a more appropriate name, same as PR-443.

if err != nil {
return 0, err
return err
}

switch typ {
case sshFxpStatus:
id, _ := unmarshalUint32(data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow. There’s a bug in this existing code. (My bad.) This id value here has to be the same as the ID field in the packet above. This is why the id is passed in PR-443 through the work channel, so that it can be properly tested for.

This mostly works because we’re never supposed to get an id mismatch here, but it’s still possible, and we should be handling it correctly if it does. (Basically, everyone has just been lucky so far.)

I noticed this bug while wondering why this PR didn’t also include the id in the channel, like PR-443 did. Answer: you probably weren’t aware of how important it was, since all you had was this buggy code I wrote. 😆


select {
case workCh <- work{b, n, off}:
case workCh <- work{b, n, off, ch}:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR-443 ordered the parameters here differently. I would like to see code parallel each other as much as possible for things like this. (i.e. having a “standard” order for these fields is good, even if the original choice is entirely arbitrary.)

Also, note as per the bug I discovered in my code, we need to be passing the id properly as well, the same as PR-443.

Finally, the field n is no unused, since there is no reference to the field after this assignment. Please remove it the same as we would any other unused variable assignment.

@@ -1601,6 +1614,7 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64
b []byte
n int
off int64
ch chan result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, ch here is too non-specific when there are more than one relevant channels in scope.

@puellanivis
Copy link
Collaborator

Functionality was merged in with #482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants