Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[23.0 Backport] Prevent containers from being included in List API before they are registered #44633

Merged
merged 3 commits into from Dec 14, 2022

Conversation

corhere
Copy link
Contributor

@corhere corhere commented Dec 13, 2022

- What I did

  • Resolved a bug where containers were being included in the List Containers API response before they finished being created
  • Resolved a bug where calling Container APIs with the ID of a container in the process of being created would result in a nil-pointer dereference panic

- How I did it
See commit descriptions for details.

- How to verify it
Patch the daemon to widen the race window and try to reproduce the issue, as described in #44512 (comment)

- Description for the changelog

  • Resolved an issue where the List Containers API would return containers in the process of being created

- A picture of a cute animal (not mandatory but encouraged)

GetContainer() would return (nil, nil) when looking up a container
if the container was inserted into the containersReplica ViewDB but not
the containers Store at the time of the lookup. Callers which reasonably
assume that the returned err == nil implies returned container != nil
would dereference a nil pointer and panic. Change GetContainer() so that
it always returns a container or an error.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 00157a4)
Signed-off-by: Cory Snider <csnider@mirantis.com>
(*Container).CheckpointTo() upserts a snapshot of the container to the
daemon's in-memory ViewDB and also persists the snapshot to disk. It
does not register the live container object with the daemon's container
store, however. The ViewDB and container store are used as the source of
truth for different operations, so having a container registered in one
but not the other can result in inconsistencies. In particular, the List
Containers API uses the ViewDB as its source of truth and the Container
Inspect API uses the container store.

The (*Daemon).setHostConfig() method is called fairly early in the
process of creating a container, long before the container is registered
in the daemon's container store. Due to a rogue CheckpointTo() call
inside setHostConfig(), there is a window of time where a container can
be included in a List Containers API response but "not exist" according
to the Container Inspect API and similar endpoints which operate on a
particular container. Remove the rogue call so that the caller has full
control over when the container is checkpointed and update callers to
checkpoint explicitly. No changes to (*Daemon).create() are needed as it
checkpoints the fully-created container via (*Daemon).Register().

Fixes moby#44512.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 0141c6d)
Signed-off-by: Cory Snider <csnider@mirantis.com>
(*Daemon).registerLinks() calling the WriteHostConfig() method of its
container argument is a vestigial behaviour. In the distant past,
registerLinks() would persist the container links in an SQLite database
and drop the link config from the container's persisted HostConfig. This
changed in Docker v1.10 (moby#16032) which migrated away from SQLite and
began using the link config in the container's HostConfig as the
persistent source of truth. registerLinks() no longer mutates the
HostConfig at all so persisting the HostConfig to disk falls outside of
its scope of responsibilities.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 388fe4a)
Signed-off-by: Cory Snider <csnider@mirantis.com>
@neersighted
Copy link
Member

I'm trying to recall the discussion around this PR last week -- was this a new regression in 23.0/master, or was this behavior present in 20.10? I'm just trying to reason in terms of risk bringing this into 23.0 this late, though the changes seem very straightforward and pass @cpuguy83's sniff test.

@corhere
Copy link
Contributor Author

corhere commented Dec 13, 2022

This bug has existed in some form since Docker v1.12 (2016-7) if not longer, so there is no pressing need to squeeze the fix into v23.0.0. It might be nice to merge it into the branch so it's there if we end up cutting a v23.0.1 release, otherwise I see no problem leaving it only in master/v-next.

@cpuguy83
Copy link
Member

@neersighted I don't think my sniff tests are as high quality as you may think 🤣

@cpuguy83
Copy link
Member

I think this is fine to bring in. It fixes a subtle/hard to track down bug.
It may expose some other bug (due to coincidental checkpointing to replicas and/or disk), but seems ok to let an RC try and shake that out given we've looked through the code to check and make sure it that's not happening.

@thaJeztah thaJeztah added this to the 23.0.0 milestone Dec 14, 2022
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - passes my sniff test

@thaJeztah thaJeztah merged commit ce27df7 into moby:23.0 Dec 14, 2022
@corhere corhere deleted the backport-23.0/fix-44512 branch December 14, 2022 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants