Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[18.03 backport] Fix update out of sequence #2902

Merged

Conversation

thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Sep 25, 2019

relates to

Backports of

# https://github.com/docker/swarmkit/pull/2762 Increased wait time on test utils WaitForCluster and WatchTaskCreate
git cherry-pick -s -S -x 5f167cab731ee75bc6cf3888fd2a4a5b5194f924

# second commit from https://github.com/docker/swarmkit/pull/2771 Allow using Configs as CredentialSpecs
git cherry-pick -s -S -x be26111c4a48c44fac04c17c69fd2504aea6db91

# https://github.com/docker/swarmkit/pull/2808 Fix flaky tests
git cherry-pick -s -S -x 06a356671bc11e4fd5d754f257f9c5f93ec5c563

# https://github.com/docker/swarmkit/pull/2870 Fix update out of sequence
git cherry-pick -s -S -x d68ac46e3b11d7384472677d210bb0ce941284dc

Minor conflict in imports when cherry-picking #2808 (06a3566) because is not in this branch (which would require updating the gRPC version that's used; see #2827 (comment))

diff --cc manager/orchestrator/replicated/update_test.go
index ccc084b8,53d6da72..00000000
--- a/manager/orchestrator/replicated/update_test.go
+++ b/manager/orchestrator/replicated/update_test.go
@@@ -1,7 -1,8 +1,12 @@@
  package replicated
  
  import (
++<<<<<<< HEAD
 +      "sync/atomic"
++=======
+       "context"
+       "sync"
++>>>>>>> 06a35667... Fix flaky tests
        "testing"
        "time"

olljanat and others added 3 commits September 25, 2019 12:08
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
(cherry picked from commit 5f167ca)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Drew Erny <drew.erny@docker.com>
(cherry picked from commit be26111)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It is likely that a large portion of test flakiness, especially in CI,
comes from the fact that swarmkit components under test are started in
goroutines, but those goroutines never have an opportunity to run. This
adds code ensuring those goroutines are scheduled and run, which should
hopefully solve many inexplicably flaky tests.

Additionally, increased test timeouts, to hopefully cover a few more
flaky cases.

Finally, removed direct use of the atomic package, in favor of less
efficient but higher-level mutexes.

Signed-off-by: Drew Erny <drew.erny@docker.com>
(cherry picked from commit 06a3566)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah
Copy link
Member Author

ok, this needs some work; possibly related to gRPC versions

#!/bin/bash -eo pipefail
make check binaries checkprotos
🐳 check
🐳 bin/swarmd
# github.com/docker/swarmkit/agent
agent/session.go:67:3: undefined: grpc.WithDefaultCallOptions
agent/session.go:67:31: undefined: grpc.MaxCallRecvMsgSize
# github.com/docker/swarmkit/manager
manager/manager.go:720:5: undefined: grpc.WithDefaultCallOptions
manager/manager.go:720:33: undefined: grpc.MaxCallRecvMsgSize
make: *** [bin/swarmd] Error 2
Exited with code 2

@thaJeztah
Copy link
Member Author

reverted #2869 to see if that works; I'll rebase if it goes green, and we can punt on that change

@codecov
Copy link

codecov bot commented Sep 25, 2019

Codecov Report

Merging #2902 into bump_v18.03 will decrease coverage by 0.07%.
The diff coverage is 0%.

@@               Coverage Diff               @@
##           bump_v18.03    #2902      +/-   ##
===============================================
- Coverage        61.63%   61.56%   -0.08%     
===============================================
  Files              134      134              
  Lines            21805    21809       +4     
===============================================
- Hits             13439    13426      -13     
- Misses            6918     6935      +17     
  Partials          1448     1448

@thaJeztah
Copy link
Member Author

OK, that works; let me remove that backport for now

A simple but old error has recently become evident. Due to the fact that
we read an object and then write it back across the boundaries of a
transaction, it is possible for the task object to have changed in
between transactions. This would cause the attempt to write out the old
task to suffer an "Update out of sequence" error.

This fix simply reads the latest version of the task back out within the
boundary of a transaction to avoid the race.

Signed-off-by: Drew Erny <drew.erny@docker.com>
(cherry picked from commit d68ac46)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah force-pushed the 18.03_backport_fix_update_out_of_sequence branch from 70e8774 to 5fca4d7 Compare September 25, 2019 13:20
@thaJeztah thaJeztah changed the title [18.03 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets [18.03 backport] Fix update out of sequence Sep 25, 2019
@thaJeztah
Copy link
Member Author

ping @dperny @kolyshkin PTAL

@dperny dperny merged commit 91bc626 into moby:bump_v18.03 Oct 2, 2019
@thaJeztah thaJeztah deleted the 18.03_backport_fix_update_out_of_sequence branch October 2, 2019 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants