-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for maximum replicas per node #37940
Conversation
build fails:
|
Sure it will until swarmkit PR have been merged that why this is marked as WIP now. Test automation set on Moby is quite complex and I'm not able run all these tests on my dev env so that why I prefer to have PR open here which can run those tests for me. Next steps are:
After these build and all these tests (should) pass. |
68f5387
to
8b32d49
Compare
Codecov Report
@@ Coverage Diff @@
## master #37940 +/- ##
=========================================
Coverage ? 36.6%
=========================================
Files ? 608
Lines ? 44990
Branches ? 0
=========================================
Hits ? 16467
Misses ? 26245
Partials ? 2278 |
22d5ad2
to
4c6a698
Compare
4c6a698
to
7584627
Compare
Swarmkit PR was merged so I included swarmkit bump to this. Builds should pass as they was earlier when I tested to include swarmkit changes from my repo. @cpuguy83 can you review ? EDIT: Swarmkit bump looks to be also on #38056 so I will drop it from this when it is merged. EDIT 2: #38056 was merged. Failing test on experimental looks to be flaky ( #32673 ). |
de44ba4
to
f422e25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor comments, but I want to give this a spin (haven't had time for that yet).
One thing I'd like to test is the behavior if the --max-replicas-per-node
value is updated for an existing service (e.g. change from max 3 replicas to max 2; are the existing tasks properly killed and/or re-scheduled in that case?)
Currently it looks working on way that if user only updates --max-replicas-per-node (or actually it is If we need change that logic then it would mean one more change to swarmkit. Anyway, I will wait that #38003 is merged and implement these API versions checks on meanwhile. |
f422e25
to
261b3d6
Compare
261b3d6
to
169e690
Compare
Temporarily included commit from #38003 will cleanup after it is merged. |
169e690
to
8ac1832
Compare
@thaJeztah rebased on master and included changes you requested. Can you review again? |
f24d569
to
298228b
Compare
7bca461
to
92a4d55
Compare
@cpuguy83 @thaJeztah PTAL, now this contains only simple API test without logic test as it is on swarmkit side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor nits inline (no critical issues, but would be nice to have them in the PR)
I was finally able to give this a spin (see below for some playing with the new option); one thing I noticed (also, see below) that reducing the max-replicas value does not terminate existing tasks. I wondered if this was by design, or an oversight? Not a show-stopper; just curious 🤗
Here's some things I tried:
Create a service with 2 replicas, but maxreplicas 1
curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/create" \
-H "Content-Type: application/json" \
-d '{"EndpointSpec":{"Mode":"vip"},"Labels":{},"Mode":{"Replicated":{"Replicas":2}},"Name":"test","TaskTemplate":{"ContainerSpec":{"DNSConfig":{},"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da"},"ForceUpdate":0,"Placement":{"MaxReplicas":1,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"Resources":{"Limits":{},"Reservations":{}}}}'
{"ID":"eln7vce863tedwd5q72kjaxkg"}
Verify the MaxReplicas
configuration is stored:
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test?insertDefaults=true" | jq .Spec.TaskTemplate.Placement.MaxReplicas
1
And check that no more than 1 replica is running (on a single-node swarm);
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
qy2qc9p1yder test replicated 1/2 nginx:alpine
Check the task, and confirm it's unable to start due to the MaxReplicas
restriction:
docker service ps test --no-trunc --format 'table {{.Name}}\t{{.Error}}'
NAME ERROR
test.1
test.2 "no suitable node (max replicas per node limit exceed)"
Updated MaxReplicas
to 2;
Get the service definition
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test"
{"ID":"cqnbsmr9foblwucehzu9ekrpb","Version":{"Index":87},"CreatedAt":"2018-12-23T22:23:22.415002133Z","UpdatedAt":"2018-12-23T22:23:22.415002133Z","Spec":{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":1,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":0,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}},"Endpoint":{"Spec":{}}}
Update the service to set MaxReplicas to 2
curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/test/update?version=87" \
-H "Content-Type: application/json" \
-d '{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":2,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":0,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}}'
Verify the service was updated with the new value (2
);
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test" | jq .Spec.TaskTemplate.Placement.MaxReplicas
2
And confirm that the service was now able to deploy 2 replicas:
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
cqnbsmr9fobl test replicated 2/2 nginx:alpine
Now update the service again, and set MaxReplicas
back to 1
curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/test/update?version=172" \
-H "Content-Type: application/json" \
-d '{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":1,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":0,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}}'
Verify the service was updated with the new value (1
)
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test" | jq .Spec.TaskTemplate.Placement.MaxReplicas
1
Note that when reducing the maximum number of replicas, existing tasks are not terminated;
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
cqnbsmr9fobl test replicated 2/2 nginx:alpine
Another thing that needs to be taken into account is that scaling/updating a service with an older version of the CLI will strip the MaxReplicas
. That's the expected behavior, but perhaps something we should mention somewhere in the documentation :thinking_face:
docker service scale test=3
test scaled to 3
After scaling, the MaxReplicas
property is reset (because the CLI does not know the new property);
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test?insertDefaults=true" | jq .Spec.TaskTemplate.Placement.MaxReplicas
null
By design as we don't want kill running containers when it is not absolutely necessary and user have option to use --force (not sure what it is on API side) and then all tasks will be re-scheduled and situation fixed. |
92a4d55
to
6e86627
Compare
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
6e86627
to
153171e
Compare
@thaJeztah updated based on review. Now hopefully ready to be merged? |
Gotcha; makes sense. We'll have to document that, and explain the behavior (i.e., if they want to force a reschedule, use the Trying that scenario: Create a service with curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/create" \
-H "Content-Type: application/json" \
-d '{"EndpointSpec":{"Mode":"vip"},"Labels":{},"Mode":{"Replicated":{"Replicas":2}},"Name":"test","TaskTemplate":{"ContainerSpec":{"DNSConfig":{},"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da"},"ForceUpdate":0,"Placement":{"MaxReplicas":2,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"Resources":{"Limits":{},"Reservations":{}}}}'
{"ID":"kdu0l3uxhk6p52xa719581h0h"} Verify the curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test" | jq .Spec.TaskTemplate.Placement
2 Confirm the service has two tasks running: docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
rmg5iwvxl158 test replicated 2/2 nginx:alpine Update Get the service definition curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test" {"ID":"kdu0l3uxhk6p52xa719581h0h","Version":{"Index":27},"CreatedAt":"2018-12-24T09:37:12.633108733Z","UpdatedAt":"2018-12-24T09:37:12.633108733Z","Spec":{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":2,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":0,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}},"Endpoint":{"Spec":{}}} Update the service, set curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/test/update?version=27" \
-H "Content-Type: application/json" \
-d '{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":1,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":1,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}}'
{"Warnings":null} Confirm that the service now has 1 task running: docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
kdu0l3uxhk6p test replicated 1/2 nginx:alpine And that this is due to the new docker service ps test --no-trunc --format 'table {{.Name}}\t{{.Error}}'
NAME ERROR
test.1
test.2 "no suitable node (max replicas per node limit exceed)"
\_ test.2 Looking good 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!!!
@olljanat was the failure only a flaky test? If so, I'm happy to ignore PowerPC and merge |
@thaJeztah yes. Those which I disabled on #38423 looks causing issues. |
Alright let's get this one merged 👍 |
Hi! I greatly love this commit. With what docker & swarm-kit version could it be actually used at production |
It's not released yet; will be in the next docker release (19.03) (but you should be able to give it a spin with the nightly builds https://download.docker.com/linux/ubuntu/dists/bionic/pool/nightly/amd64/ |
- What I did
Added support for maximum replicas per node
Second step to be able solve #26259
- How I did it
Addes support for Placement parameter MaxReplicas
TODO: