Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker volume prune removes ALL volumes when live restore is enabled and unless-stopped restart policy is used #41686

Closed
Chostakovitch opened this issue Nov 18, 2020 · 9 comments · Fixed by #44231
Labels
area/volumes kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/19.03

Comments

@Chostakovitch
Copy link

Chostakovitch commented Nov 18, 2020

Note : issue edited to include an additional restart of Docker daemon to really reproduce the issue from scratch.

Description

When live restore is enabled, stopping and restarting the daemon then running docker volume prune removes all volumes mounted in containers with unless-stopped policy.

It is very dangerous, as volumes used by running containers will just disappear with no warning.

Steps to reproduce the issue:

Here is a MWE from a fresh Debian 10.

  1. Install Docker from the Docker repository
  2. Add yourself into the docker group
  3. Start the Docker daemon a first time :
sudo systemctl start docker
  1. Edit /etc/docker/daemon.json with the following content to enable live restore :
{
	"live-restore": true
}
  1. Restart the Docker daemon to enable live restore :
$ sudo systemctl restart docker
  1. Create a local volume (pica) :
$ docker volume create pica
  1. Create a dummy container with unless-stopped policy and mount the volume :
$ docker run -d --volume pica:/pica --name pica --restart unless-stopped busybox sleep 10000
  1. Check that the container is running :
$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
32749a84d3a6        busybox             "sleep 10000"       17 seconds ago      Up 16 seconds                           pica
  1. Restart the daemon. With live restore enabled, the container won't stop.
$ sudo systemctl restart docker
  1. Check that the container is indeed still running :
$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
32749a84d3a6        busybox             "sleep 10000"       About a minute ago   Up 38 seconds                           pica
  1. Prune volumes :
$ docker volume prune
WARNING! This will remove all local volumes not used by at least one container.
Are you sure you want to continue? [y/N] y
Deleted Volumes:
pica

Total reclaimed space: 0B

Describe the results you received:

The pica volume is removed whereas the pica container is still running.
Even after the removal, we can still execute a command in the container, showing that it is really still running :

 docker exec -it pica sh
/ # 

Excerpt of docker inspect pica :

"Mounts": [
            {
                "Type": "volume",
                "Name": "pica",
                "Source": "/var/lib/docker/volumes/pica/_data",
                "Destination": "/pica",
                "Driver": "local",
                "Mode": "z",
                "RW": true,
                "Propagation": ""
            }
        ]

Obviously the volume has been removed, so :

$ docker volume inspect pica
[]
Error: No such volume: pica

$ sudo ls /var/lib/docker/volumes/pica/_data
ls: cannot access '/var/lib/docker/volumes/pica/_data': No such file or directory

Describe the results you expected:

Volumes mounted in running containers should not be removed when running docker volume prune.

Additional information you deem important (e.g. issue happens only occasionally):

With live restore alone or unless-stopped alone, docker volume prune will not remove volumes mounted in running containers.

If a container has been manually restarted after restarting the Docker daemon, the volume won't be removed.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:02:55 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:25 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.7
  GitCommit:        8fba4e9a7d01810a393d5d25a3621dc101981175
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-5-cloud-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 1.904GiB
 Name: vps-5c256a04
 ID: D3BW:DYRD:D2UJ:5FAM:NOIR:CEPB:CDKT:DVBP:OFLO:WNFW:SPLK:LTCA
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Tested with Debian 10 on virtual machines.

@Chostakovitch Chostakovitch changed the title Pruning volume removes ALL volumes when live restore is enabled and unless-stopped restart policy is used docker volume prune removes ALL volumes when live restore is enabled and unless-stopped restart policy is used Nov 18, 2020
@ppom0
Copy link

ppom0 commented Nov 18, 2020

Could reproduce this bug on a fresh Debian 10.6 VM, following the specified steps.

@thaJeztah
Copy link
Member

Thanks for reporting!

Create a dummy container with unless-stopped policy and mount the volume :

is it only happening with that restart policy?

My initial suspicion would be that dockerd keeps reference counts for volumes and mounts in memory, and would do so when creating/starting/stopping containers. Possibly in the "live-restore" situation, it wouldn't have to recreate (or restart) that container, and thus doesn't increment the reference counts 🤔

/cc @cpuguy83

@thaJeztah thaJeztah added area/volumes kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/19.03 labels Nov 19, 2020
@Chostakovitch
Copy link
Author

Chostakovitch commented Nov 19, 2020

Thanks for your answer ! 😁

is it only happening with that restart policy?

We only use unless-stopped policy, so I didn't initially check the other ones.

This issue occurs with always and unless-stopped policies, and does not occur with no and on-failure policies.

So even with live restore enabled, with a no policy, the container does not restart when dockerd restarts and volume prune won't remove the mounted volume.

@thaJeztah
Copy link
Member

So even with live restore enabled, with a no policy, the container does not restart when dockerd restarts and volume prune won't remove the mounted volume.

Hmmm "interesting" 🤔

@tholu
Copy link

tholu commented Jan 10, 2022

This just happened here as well with the result of losing data. Very dangerous bug, any update on this?

@chessmango
Copy link

Can confirm still a problem :P fortunately had backups to roll back to, but would be nice to reconcile state to avoid pruning in-use volumes

@Nassiel
Copy link

Nassiel commented Aug 21, 2022

I can confirm that this also can happen without "live-restore": true because happened to me 3 days ago.

@cpuguy83
Copy link
Member

Here's a fix for this: #44231

@Chostakovitch
Copy link
Author

Thanks a lot for you investigation and fix, @cpuguy83 ! Do you have a vague idea of when can we hope a release ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/volumes kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/19.03
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants