Per-interface sysctls #47686

robmry · 2024-04-05T16:35:32Z

- What I did

Closes Per-interface sysctls #47639

Until now it's been possible to set per-interface sysctls using, for example, --sysctl net.ipv6.conf.eth0.accept_ra=2. But, the index in the interface name is allocated serially, and the numbering in a container with more than one interface may change when a container is restarted. The change to make it possible to connect a container to more than one network when it's created increased the ambiguity.

This change adds label com.docker.network.endpoint.sysctls to the DriverOpts in EndpointSettings. This option is explicitly associated with the interface.

Settings in --sysctl for eth0 are migrated to EndpointSettings.DriverOpts.

Because using --sysctl with any interface apart from eth0 would have unpredictable results, it is now an error to use any other interface name in the top level --sysctl option. The error message includes a hint at how to use the new per-interface setting.

The per-endpoint sysctl name is a shortened form of the sysctl name, intended to limit settings to 'net.*', and to eliminate the need to identify the interface by name. For example:
net.ipv6.conf.eth0.accept_ra=2
becomes:
ipv6.conf.accept_ra=2

The value of DriverOpts["com.docker.network.endpoint.sysctls"] is a comma separated list of these short-form sysctls.

Settings from --sysctl are applied by the runtime lib during task creation. So, task creation fails if the endpoint does not exist. Applying per-endpoint settings during interface configuration means the endpoint can be created later, which paves the way for removal of the SetKey OCI prestart hook.

Unlike other DriverOpts, the sysctl label itself is not driver-specific, but each driver has a chance to check settings/values and raise an error if a setting would cause it a problem - no such checks have been added in this initial version. As a future extension, if required, it would be possible for the driver to echo back valid/extended/modified settings to libnetwork for it to apply to the interface. (At that point, the syntax for the options could become driver specific to allow, for example, a driver to create more than one interface.)

Related changes are needed in the CLI, to make it possible to set the new DriverOpts value ... it's not possible to set them using the existing advanced --network syntax, because the list of sysctl values includes = and , characters (so, can't be distinguished from separators in the --network syntax)... docker/cli#4994

- How I did it

migrate per-endpoint sysctls from the top-level into per-interface DriverOpts
pass those per-interface sysctls to the osl.Network, along with other config values for the interface
apply those sysctls during interface configuration (which currently still happens during the SetKey prestart callback, but needn't)

- How to verify it

New unit and integration tests.

And ...

Migration of one or two top level --sysctl settings ...

# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth0.accept_ra=2 alpine
WARNING: Migrated net.ipv6.conf.eth0.accept_ra to DriverOpts{"com.docker.network.endpoint.sysctls":"ipv6.conf.accept_ra=2"}.
/ #

# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth0.accept_ra=2 --sysctl=net.ipv6.conf.eth0.forwarding=1 alpine
WARNING: Migrated net.ipv6.conf.eth0.accept_ra,net.ipv6.conf.eth0.forwarding to DriverOpts{"com.docker.network.endpoint.sysctls":"ipv6.conf.accept_ra=2,ipv6.conf.forwarding=1"}.
/ #

Inspect output ...

# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth0.accept_ra=2 --sysctl=net.ipv6.conf.eth0.forwarding=1 --sysctl=net.ipv6.conf.default.disable_ipv6=0 alpine
...

# docker inspect c1
[
    {
        ...
        "HostConfig": {
            ...
            "Sysctls": {
                "net.ipv6.conf.default.disable_ipv6": "0"
            },
        ...
        "NetworkSettings": {
             ...
            "Networks": {
                "mynet": {
                ...
                    "DriverOpts": {
                        "com.docker.network.endpoint.sysctls": "ipv6.conf.accept_ra=2,ipv6.conf.forwarding=1"
                    },
                ...
]

No migration for eth1 ...

# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth1.accept_ra=2 alpine
docker: Error response from daemon: unable to determine network endpoint for sysctl net.ipv6.conf.eth1.accept_ra, use driver option 'com.docker.network.endpoint.sysctls' to set per-interface sysctls.
See 'docker run --help'.

Attempt to set a per-interface sysctl for mpls ...

# docker run --rm -ti --name c1 --network mynet --sysctl=net.mpls.conf.eth0.input=1 alpine
docker: Error response from daemon: invalid config for network mynet: invalid endpoint settings:
unrecognised network interface sysctl 'mpls.conf.input=1'; represent 'net.X.Y.ethN.Z=V' as 'X.Y.Z=V', 'X' must be 'ipv4' or 'ipv6'.
See 'docker run --help'.

Nonexistent sysctl (very verbose error message at the moment) ...

# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth0.foo=1 alpine
WARNING: Migrated net.ipv6.conf.eth0.foo to DriverOpts{"com.docker.network.endpoint.sysctls":"ipv6.conf.foo=1"}.
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: failed to add interface veth00d9732 to sandbox: /proc/sys/net/ipv6/conf/eth0/foo is not a sysctl file: unknown.

But, a lot of that waffle will go-away once the prestart hook is removed and settings are applied after task creation. It'll be more like ...

# docker run --rm -ti --name c1 --network mynet --sysctl=net.ipv6.conf.eth0.foo=1 alpine
WARNING: Migrated net.ipv6.conf.eth0.foo to DriverOpts{"com.docker.network.endpoint.sysctls":"ipv6.conf.foo=1"}.
docker: Error response from daemon: unable to write to '/proc/sys/net/ipv6/conf/eth0/foo' (derived from 'ipv6.conf.foo', use format X.Y.Z to map to net.X.Y.eth0.Z): /proc/sys/net/ipv6/conf/eth0/foo is not a sysctl file: unknown.

- Description for the changelog

Allow sysctls to be set per-interface during container creation and network connection.

api/server/router/container/container_routes.go

akerouanton · 2024-04-15T14:52:20Z

api/server/router/container/container_routes.go

+			// Only try to migrate settings for "eth0", anything else would always
+			// have behaved unpredictably.
+			if spl[3] != "eth0" {
+				return "", fmt.Errorf(`unable to determine network endpoint for sysctl %s, use '--network=name=%s,sysctl=%s' or compose 'driver_opts: "%s":"%s"`,


I think we shouldn't put CLI-specific or Compose-specific remediation steps here -- the API could be called by other tools where those steps won't make any sense.

OTOH CLI error messages sometimes looks cryptic for users not familiar with our API. I think we don't have a plan for 'augmenting' CLI error messages with remediation steps. Maybe that's something we need to discuss.

Yes, it's not great, but I'm not sure how best to improve it.

I think it's quite important that we give good clues about how to specify per-interface sysctls - here, for the migration case below, and for when we refuse to migrate from the top level --sysctl in a future release.

A "for example" might help a little, since CLI and compose are probably the common cases. But, not really.

Maybe the best we can do is just delete the hints, and hope the user's able to find the right section of the docs.

I'm not sure how augmented CLI messages would work, perhaps the API would need to return some token that'd tell the client to explain how to set per-interface sysctls in its world (extended --network syntax for the CLI, or driver-opts in compose)? I'm probably missing the point (?!), but any sort of mechanism like that sounds like a big change that'd have to be out-of-scope here.

I changed the message (and updated the examples in PR description to show it) ... now it only mentions the driver-opt label - and the user will have to figure out how to use it.

But I've also updated the CLI PR docker/cli#4994 to get rid of the --network sysctl= option and document the use of [create|run] --network driver-opt=com.docker.network.endpoint.sysctls=[value] and network connect --driver-opt=com.docker.network.endpoint.sysctls= to set multiple sysctls for an endpoint.

akerouanton · 2024-04-15T14:59:46Z

api/server/router/container/container_routes.go

+		return "", nil
+	}
+
+	// TODO(robmry) - refuse to do the migration, generate an error if API > some-future-version.


Next API version should be fine.

This change should land in release 27.0 (along with re-removing the SetKey hook that requires it). So, we'd want to deprecate per-interface sysctls in --sysctl in 27.0, and remove the auto-migration in 28.0.

The current API version is 1.45, and it will be in the upcoming release 26.1. But, it might change in 27.0? In that case, if we make this code check for API version >1.45, we'll have accidentally removed the auto-migration in release 27.0.

So, it's probably best to raise a new issue to say a version check needs to be added, and mark it for milestone 28.0?

akerouanton · 2024-04-15T15:00:10Z

api/server/router/container/container_routes.go

+	// TODO(robmry) - refuse to do the migration, generate an error if API > some-future-version.
+
+	newDriverOpt := strings.Join(netIfSysctls, ",")
+	warning := fmt.Sprintf(`Migrated %s to DriverOpts{"%s":"%s"}. (Use "--network=name=%s,sysctl=%s", or compose "driver_opts".)`,


Same as for the error above.

api/server/router/container/container_routes_test.go

Signed-off-by: Rob Murray <rob.murray@docker.com>

Until now it's been possible to set per-interface sysctls using, for example, '--sysctl net.ipv6.conf.eth0.accept_ra=2'. But, the index in the interface name is allocated serially, and the numbering in a container with more than one interface may change when a container is restarted. The change to make it possible to connect a container to more than one network when it's created increased the ambiguity. This change adds label "com.docker.network.endpoint.sysctls" to the DriverOpts in EndpointSettings. This option is explicitly associated with the interface. Settings in "--sysctl" for "eth0" are migrated to DriverOpts. Because using "--sysctl" with any interface apart from "eth0" would have unpredictable results, it is now an error to use any other interface name in the top level "--sysctl" option. The error message includes a hint at how to use the new per-interface setting. The per-endpoint sysctl name is a shortened form of the sysctl name, intended to limit settings to 'net.*', and to eliminate the need to identify the interface by name. For example: net.ipv6.conf.eth0.accept_ra=2 becomes: ipv6.conf.accept_ra=2 The value of DriverOpts["com.docker.network.endpoint.sysctls"] is a comma separated list of these short-form sysctls. Settings from '--sysctl' are applied by the runtime lib during task creation. So, task creation fails if the endpoint does not exist. Applying per-endpoint settings during interface configuration means the endpoint can be created later, which paves the way for removal of the SetKey OCI prestart hook. Unlike other DriverOpts, the sysctl label itself is not driver-specific, but each driver has a chance to check settings/values and raise an error if a setting would cause it a problem - no such checks have been added in this initial version. As a future extension, if required, it would be possible for the driver to echo back valid/extended/modified settings to libnetwork for it to apply to the interface. (At that point, the syntax for the options could become driver specific to allow, for example, a driver to create more than one interface). Signed-off-by: Rob Murray <rob.murray@docker.com>

robmry self-assigned this Apr 5, 2024

robmry added kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/networking impact/changelog impact/documentation labels Apr 5, 2024

robmry mentioned this pull request Apr 8, 2024

Document CLI support for per interface sysctls docker/cli#4994

Open

robmry removed the impact/documentation label Apr 8, 2024

robmry force-pushed the 47639_per-interface-sysctls branch from 79f4458 to 745a4c4 Compare April 8, 2024 09:31

robmry marked this pull request as ready for review April 8, 2024 10:12

robmry requested review from akerouanton and corhere April 8, 2024 10:12

robmry mentioned this pull request Apr 15, 2024

Docker 26 breaks kata networking kata-containers/kata-containers#9340

Open

akerouanton reviewed Apr 15, 2024

View reviewed changes

robmry force-pushed the 47639_per-interface-sysctls branch from 745a4c4 to 5d0ab3f Compare April 18, 2024 13:45

robmry force-pushed the 47639_per-interface-sysctls branch from 5d0ab3f to 5d70d23 Compare April 29, 2024 17:38

robmry force-pushed the 47639_per-interface-sysctls branch from 5d70d23 to 2edd300 Compare May 8, 2024 15:58

robmry added 3 commits May 8, 2024 17:05

Factor out selection of endpoint for config migration

12b4fc1

Signed-off-by: Rob Murray <rob.murray@docker.com>

Move EndpointSettings.DriverOpts from op-state to config

6cbeb3f

Signed-off-by: Rob Murray <rob.murray@docker.com>

robmry force-pushed the 47639_per-interface-sysctls branch from 2edd300 to 2681c58 Compare May 8, 2024 16:07

robmry added this to the 27.0.0 milestone May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-interface sysctls #47686

Per-interface sysctls #47686

robmry commented Apr 5, 2024 •

edited

akerouanton Apr 15, 2024

robmry Apr 16, 2024

robmry May 8, 2024 •

edited

akerouanton Apr 15, 2024

robmry Apr 18, 2024 •

edited

akerouanton Apr 15, 2024

Per-interface sysctls #47686

Are you sure you want to change the base?

Per-interface sysctls #47686

Conversation

robmry commented Apr 5, 2024 • edited

akerouanton Apr 15, 2024

Choose a reason for hiding this comment

robmry Apr 16, 2024

Choose a reason for hiding this comment

robmry May 8, 2024 • edited

Choose a reason for hiding this comment

akerouanton Apr 15, 2024

Choose a reason for hiding this comment

robmry Apr 18, 2024 • edited

Choose a reason for hiding this comment

akerouanton Apr 15, 2024

Choose a reason for hiding this comment

robmry commented Apr 5, 2024 •

edited

robmry May 8, 2024 •

edited

robmry Apr 18, 2024 •

edited