Stop ReadFromWithConcurrency submitting out of order writes #480

ncw · 2021-11-11T20:08:49Z

This is for discussion. This is causing a real problem in rclone rclone/rclone#5806 but there may be a better way of fixing.

Before this change, ReadFromWithConcurrency would very often submit writes to the server out of order due to small timing differences when lots of blocks were queued and ready to send.

This causes more work for the server as seeking about the file isn't free. Writing chunks of a file out of order can also cause fragmentation and consequently loss of performance on that file.

This patch causes the write to the channel to be handled synchronously but the acks are received asynchronously allowing for many write packets to be in flight as before.

Before this change, ReadFromWithConcurrency would very often submit writes to the server out of order due to small timing differences when lots of blocks were queued and ready to send. This causes more work for the server as seeking about the file isn't free. Writing chunks of a file out of order can also cause fragmentation and consequently loss of performance on that file. This patch causes the write to the channel to be handled synchronously but the acks are received asynchronously allowing for many write packets to be in flight as before.

ncw · 2021-11-11T20:17:33Z

I speed tested this over a long wide pipe and I couldn't see any difference in speed before and after the patch, both about 40 MiB/s falling all the way back to 900KiB/s with concurrent writes disabled as you'd expect.

ncw · 2021-11-12T10:47:51Z

Thinking about this some more

this idea could be used in other reading and writing parts of the library
can replies to sftp messages come back out of order? If not then starting concurrency go routines isn't necessary, only one is needed

drakkan · 2021-11-14T15:33:26Z

Hi,

thanks for the contribution, it seems fine to me, however this code was written by @puellanivis, at the moment she has some personal issues, please wait some time for her to have a chance to have a look to the code, thanks for your patience

puellanivis · 2021-11-14T20:36:14Z

Indeed, sequential ordering of requests is probably a good idea, and I think I was writing it into the client updates.

There are some things I think that should be addressed, but as @drakkan mentioned, I'm currently out. (In hospital) and haven't been able to really do anything yet. This also happened as I was moving into a new apartment, and now missed the appointment for connecting my internet. 😫

ncw · 2021-11-14T22:50:24Z

Thanks for taking a look and happy to make changes when you are up to it. No hurry though and look after yourself.

puellanivis · 2021-11-16T11:14:03Z

Oh, I did want to mention, I did similar work already in PR 443, it might help to review that PR, and compare to what you're doing here. That'll end up being one of the first things I do myself.

ncw · 2021-11-24T14:39:26Z

Any thoughts on this PR?

puellanivis

I had recommended reviewing PR-443 and paralleling what was done there, as it would be the first thing I would do. Doing so helped me spot a bug in the existing code, that we can fix as part of this PR.

Also, a number of little “trivial” things could have been caught ahead of review that would nevertheless would make the code inconsistent between two highly similar tasks.

puellanivis · 2021-11-25T17:18:41Z

client.go

@@ -1424,31 +1424,44 @@ func (f *File) Write(b []byte) (int, error) {
 	return n, err
 }

-func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) {
-	typ, data, err := f.c.sendPacket(ch, &sshFxpWritePacket{
+func (f *File) startWriteChunkAt(ch chan result, b []byte, off int64) {


This function no longer provides any further functionality than just calling the f.c.dispatchRequest itself. Consider how in PR-443, I inlined the calls to dispatchRequest primarily because I also needed to get the ID used, in order to pass it into the later unmarshalStatus.

🤔 Actually, the existing code has a bug, that lead you astray here. See the later comment.

puellanivis · 2021-11-25T17:22:27Z

client.go

+	return nil
+}
+
+func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) {


I’m not sure I see the benefits of reimplementing writeChunkAt in terms of the new functions defined. Almost all of this is still already handled by sendPacket.

And, the existing writeChunkAt does not return len(b) if err != nil. This new implementation would return len(b) regardless of if there were an error or not.

This is one of the main reasons I didn’t bother refactoring readChunkAt in PR-443. Refactoring comes with risks, and it’s easy to introduce bugs accidentally. I prefer better to give a new path, and when all references to the old path are gone, then you can remove it safely. Until then, who knows what functionality people are accidentally depending on…

puellanivis · 2021-11-25T17:25:43Z

client.go

@@ -1628,9 +1642,11 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64
 			n, err := r.Read(b)
 			if n > 0 {
 				read += int64(n)
+				ch := make(chan result, 1)


Please use newResChanPool(concurrency) as shown in PR-443, to avoid unnecessarily allocation of new channels for every write, rather than reusing ones that have already been received on. (The chan result used by dispatchRequest is a “guaranteed one receive, and only one receive” design, which allows us pool them.)

puellanivis · 2021-11-25T17:26:47Z

client.go

@@ -1628,9 +1642,11 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64
 			n, err := r.Read(b)
 			if n > 0 {
 				read += int64(n)
+				ch := make(chan result, 1)


ch is a bit too generic of a name. There a number of channels already in scope here, and using “ch” suggests that this is The Channel—the singular one available in scope. Since this is one of many relevant channels in scope, a more specific name is desirable. Since this is the channel we will be receiving our results on, using res is a more appropriate name, same as PR-443.

puellanivis · 2021-11-25T17:30:55Z

client.go

 	if err != nil {
-		return 0, err
+		return err
 	}

 	switch typ {
 	case sshFxpStatus:
 		id, _ := unmarshalUint32(data)


Oh wow. There’s a bug in this existing code. (My bad.) This id value here has to be the same as the ID field in the packet above. This is why the id is passed in PR-443 through the work channel, so that it can be properly tested for.

This mostly works because we’re never supposed to get an id mismatch here, but it’s still possible, and we should be handling it correctly if it does. (Basically, everyone has just been lucky so far.)

I noticed this bug while wondering why this PR didn’t also include the id in the channel, like PR-443 did. Answer: you probably weren’t aware of how important it was, since all you had was this buggy code I wrote. 😆

puellanivis · 2021-11-25T17:36:17Z

client.go


 				select {
-				case workCh <- work{b, n, off}:
+				case workCh <- work{b, n, off, ch}:


PR-443 ordered the parameters here differently. I would like to see code parallel each other as much as possible for things like this. (i.e. having a “standard” order for these fields is good, even if the original choice is entirely arbitrary.)

Also, note as per the bug I discovered in my code, we need to be passing the id properly as well, the same as PR-443.

Finally, the field n is no unused, since there is no reference to the field after this assignment. Please remove it the same as we would any other unused variable assignment.

puellanivis · 2021-11-25T17:45:37Z

client.go

@@ -1601,6 +1614,7 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64
 		b   []byte
 		n   int
 		off int64
+		ch  chan result


As mentioned, ch here is too non-specific when there are more than one relevant channels in scope.

puellanivis · 2021-12-15T11:23:38Z

Functionality was merged in with #482

puellanivis reviewed Nov 25, 2021

View reviewed changes

puellanivis mentioned this pull request Dec 5, 2021

Sequentially issue write requests, process results concurrently #482

Merged

puellanivis closed this Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop ReadFromWithConcurrency submitting out of order writes #480

Stop ReadFromWithConcurrency submitting out of order writes #480

ncw commented Nov 11, 2021 •

edited

ncw commented Nov 11, 2021

ncw commented Nov 12, 2021

drakkan commented Nov 14, 2021

puellanivis commented Nov 14, 2021

ncw commented Nov 14, 2021

puellanivis commented Nov 16, 2021

ncw commented Nov 24, 2021

puellanivis left a comment

puellanivis Nov 25, 2021

puellanivis Nov 25, 2021

puellanivis Nov 25, 2021

puellanivis Nov 25, 2021

puellanivis Nov 25, 2021

puellanivis Nov 25, 2021

puellanivis Nov 25, 2021

puellanivis commented Dec 15, 2021

Stop ReadFromWithConcurrency submitting out of order writes #480

Stop ReadFromWithConcurrency submitting out of order writes #480

Conversation

ncw commented Nov 11, 2021 • edited

ncw commented Nov 11, 2021

ncw commented Nov 12, 2021

drakkan commented Nov 14, 2021

puellanivis commented Nov 14, 2021

ncw commented Nov 14, 2021

puellanivis commented Nov 16, 2021

ncw commented Nov 24, 2021

puellanivis left a comment

Choose a reason for hiding this comment

puellanivis Nov 25, 2021

Choose a reason for hiding this comment

puellanivis Nov 25, 2021

Choose a reason for hiding this comment

puellanivis Nov 25, 2021

Choose a reason for hiding this comment

puellanivis Nov 25, 2021

Choose a reason for hiding this comment

puellanivis Nov 25, 2021

Choose a reason for hiding this comment

puellanivis Nov 25, 2021

Choose a reason for hiding this comment

puellanivis Nov 25, 2021

Choose a reason for hiding this comment

puellanivis commented Dec 15, 2021

ncw commented Nov 11, 2021 •

edited