Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm: Missing heartbeat-tick and election-tick configurable flags on docker swarm init and join #24941

Open
abronan opened this issue Jul 22, 2016 · 8 comments
Labels
area/swarm kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. priority/P2 Normal priority: default priority applied.

Comments

@abronan
Copy link
Contributor

abronan commented Jul 22, 2016

docker master

--heartbeat-tick and --election-tick flags to configure Raft election and heartbeat timeouts are missing from the command-line.

As of now, we bootstrap Managers using the default values (3s for election and 1s for heartbeats).

We should include the flags to allow custom values.

@abronan abronan added this to the 1.12.0 milestone Jul 22, 2016
@abronan abronan self-assigned this Jul 22, 2016
@abronan
Copy link
Contributor Author

abronan commented Jul 22, 2016

This one is a bit tricky because of the Manager/Worker role decomposition:

First we can't use dynamic Spec updates for that, because election and timeout are static fields given to the Manager when bootstrapping the local raft node. Modifying those values through the Spec would mean restarting raft on the fly which we agreed was risky once multiple members form a raft cluster.

We can add the flags to docker swarm init, so far so good.

However we'd also have to add those flags to docker swarm join which will feel inconsistent and out of place given the promotion workflow (and you would have to specify flags for raft even though the node might stay a worker forever).

Not sure what is the best way to proceed on this one.

We can put the flag on docker swarm init and use the values from the Spec on the Worker prior to promoting it as a Manager. However it is not very user friendly, one side has the flags while the other don't.

WDYT?

/cc @aluzzardi @aaronlehmann @tonistiigi

@thaJeztah
Copy link
Member

is this related to #24557 ?

@thaJeztah
Copy link
Member

I'm making this a higher priority; while testing some scenarios with RC4 and current master, I consistently ended up in a situation where all containers landed on a single node. I briefly discussed this with @cpuguy83 and it looks like Swarmkit is acting too quickly if the recovering time is taking too long (as mentioned, for me it was consistently reproducible). Possibly, setting a longer duration can prevent this situation.

I put my steps in a gist, but can open a separate issue if you think it makes more sense
https://gist.github.com/thaJeztah/1599b09b41e696837235f2509df5ade1

Making this a higher priority, but open to suggestions (perhaps I'm doing it wrong 😄)

@thaJeztah thaJeztah added the priority/P2 Normal priority: default priority applied. label Jul 25, 2016
@thaJeztah
Copy link
Member

ping @tiborvass @vieux FYI ^^

@cpuguy83
Copy link
Member

@thaJeztah Or for some reason the new leader didn't have proper node state when it took leadership.

@abronan
Copy link
Contributor Author

abronan commented Jul 25, 2016

@thaJeztah No it's not related to #24557 (I would say that it's complementary). I'm not sure the issue you describe is related either, the electionTick and heartbeatTick are only related to Raft and the cluster state. Here it is a problem with the dispatcher I think and we had a known issue about killing the Leader and having containers going onto the same node after the restart (I could actually reproduce). It was supposedly fixed in swarmkit a while ago but we should open a new issue for it if it is reproducible using the rc or master.

@aluzzardi
Copy link
Member

The problem with those flags being at swarm init time is they cannot be changed - ever - unless you nuke the cluster.

There are a few options:

  • don't support them for now until we have dynamic raft restart
  • put them on the spec but require an engine restart
  • put them as engine options

@thaJeztah
Copy link
Member

opened #25017

@thaJeztah thaJeztah modified the milestones: 1.12.1, 1.12.0 Jul 28, 2016
@tiborvass tiborvass modified the milestones: 1.12.1, 1.12.2 Aug 30, 2016
@aluzzardi aluzzardi added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label Sep 14, 2016
@aluzzardi aluzzardi removed this from the 1.12.2 milestone Sep 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/swarm kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. priority/P2 Normal priority: default priority applied.
Projects
None yet
Development

No branches or pull requests

5 participants