Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot podman rm after reboot with fs full #13967

Open
martinetd opened this issue Apr 22, 2022 · 26 comments
Open

cannot podman rm after reboot with fs full #13967

martinetd opened this issue Apr 22, 2022 · 26 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@martinetd
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Running podman rm (or podman ps or any other command) fails on a freshly booted system (runRoot empty) when graphRoot is full.

In my particular use case, we have a filesystem dedicated to podman graphRoot, so when that hits maximum capacity our user could no longer delete stopped image to free space.

Steps to reproduce the issue:

I've reproduced on my laptop as follow, as root:

# truncate -s 200M /tmp/btr
# mkfs.btrfs /tmp/btr
# mount /tmp/btr /mnt/t
# /src/podman/bin/podman --runroot /run/containers.test --root /mnt/t/containers ps
# (eventually at this point run something)
# dd if=/dev/urandom of=/mnt/t/filler bs=1M
<ENOSPC>
# for f in {1..100}; do dd if=/dev/urandom of=/mnt/t/filler.$f bs=4k count=4 status=none || break; done
<ENOSPC> (rationale is single big file isn't enough to fill 100% of the FS)
# rm -rf /run/containers.test  # (simulate reboot)
# /src/podman/bin/podman --runroot /run/containers.test --root /mnt/t/container ps
ERRO[0000] [graphdriver] prior storage driver overlay failed: write /mnt/t/container/overlay/metacopy-check582242757/l1/.tmp-f3761660769: no space left on device 
Error: write /mnt/t/container/overlay/metacopy-check582242757/l1/.tmp-f3761660769: no space left on device
# (same result with podman rm)
# touch '/run/containers.test/overlay/metacopy()-false' '/run/containers.test/overlay/native-diff()-true'
# /src/podman/bin/podman --runroot /run/containers.test --root /mnt/t/container ps
<works>

Describe the results you received:

ENOSPC error for something that shouldn't require space

Describe the results you expected:

actual listing files or allowing to delete some.

Additional information you deem important (e.g. issue happens only occasionally):

There are various tests made -- rightly so -- on overlay directory that are cached in /run.
I see various ways of working around this:

  • move to cache to the storage we're testing. This is related to a specific graphRoot, so it'd make senes to cache it there instead -- that'd make the cached result persistent so it wouldn't go away on reboot and allow this to work. That's probably for the best -- what if someone changes their graphRoot without resetting their runRoot?
  • disable these checks for commands that shouldn't care about these (ps, rm probably won't go about creating new overlays, so don't need to know)
  • allow test failures and handle them as whatever result is safe for some commands (e.g. ps, rm); that's pretty hacky and probably not reliable

Output of podman version:

I've reproduced on today's main:

Client:       Podman Engine
Version:      4.0.0-dev
API Version:  4.0.0-dev
Go Version:   go1.17.8
Git Commit:   78ccd833906087d171f608d66a0384135dc80717
Built:        Fri Apr 22 13:53:53 2022
OS/Arch:      linux/amd64

Output of podman info --debug:

shouldn't be needed, ask if you really want it.

Package info (e.g. output of rpm -q podman or apt list podman):

built from sources.

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 22, 2022
@vrothberg
Copy link
Member

Thanks for reaching out, @martinetd!

Which system are you running on? Version: 4.0.0-dev caught my eye.

@giuseppe
Copy link
Member

  • move to cache to the storage we're testing. This is related to a specific graphRoot, so it'd make senes to cache it there instead -- that'd make the cached result persistent so it wouldn't go away on reboot and allow this to work. That's probably for the best -- what if someone changes their graphRoot without resetting their runRoot?

these tests check kernel features, so they should not be persisted since you can change kernel version.

  • disable these checks for commands that shouldn't care about these (ps, rm probably won't go about creating new overlays, so don't need to know)
  • allow test failures and handle them as whatever result is safe for some commands (e.g. ps, rm); that's pretty hacky and probably not reliable

even if we disable the tests, there are still files/directories created by podman to track its internal state. We can address some of these issues now, but it will be a nightmare to maintain it in the future as we need to handle ENOSPC for every file operation.

@martinetd
Copy link
Author

Which system are you running on? Version: 4.0.0-dev caught my eye.

I originally was running this on alpine, version here was reproducing on latest git

these tests check kernel features, so they should not be persisted since you can change kernel version.

Ah, yeah ok that makes sense. I was thinking filesystem features (e.g. overlay on top of tmpfs doesn't work exactly the same as overlay on top of btrfs) but if kernel version also matters it's less obvious.

even if we disable the tests, there are still files/directories created by podman to track its internal state. We can address some of these issues now, but it will be a nightmare to maintain it in the future as we need to handle ENOSPC for every file operation.

Well, that state doesn't need everything -- but it'd be great if a minimal set of commands (ps, rm, rmi ?) could work so space can be freed.
As there that partition is exclusively used by podman I don't see any way of freeing space once it gets into that state.
Well, one could dig into overlay subdirectories and delete files manually, but figuring out which files come from dead containers overlays and are safe to delete and which aren't isn't easy to explain.

@martinetd
Copy link
Author

hmm, so trying a bit further (I only tried ps earlier), after "fixing" ps by creating the test files in /run rm also fails because it tries to update bolt_state.db before doing the rm.

I guess it'll be difficult to get anything reliably working in these conditions... But once again, there is nothing else than podman files in this partition: there is nothing to clean except podman containers data.

(This happens to be a btrfs so for the customer this happen at I've advised to temporarily grow the filesystem by adding another partition, running podman commands to clean then remove the partition back but I don't feel comfortable explaining this in our user manual. At least there is no urgency at this point)

So it doesn't have to be all commands, and perhaps not even normal commands, but for example I was able to run podman mount, at which point one can safely remove files in root/overlay//merged/../diff manually, so just reversing the order removing files -> updating bolt_state.db would likely work. Given there's no "podman fsck" to reap orphans it'd actually probably be better in this order?

image rm also didn't work (it tries to create a temporary images.json file in the same directory before doing rm work); I wouldn't want to change this order though as half removed image still listed would be pretty bad.

Anyway, we don't really need everything to work -- just freeing enough space for normal commands to work next would do. Making sure podman ps -a works and having a hard podman rm mode that doesn't care about metadata would be good.

(The devil in me makes me say it'd be easier to create a dummy 1MB file somewhere to be removed in case of such emergencies... But it'd be a shame for the million of users who don't need it, so let's not go there)

@rhatdan
Copy link
Member

rhatdan commented May 2, 2022

@mheon WDYT?

@mheon
Copy link
Member

mheon commented May 3, 2022

I don't see a way to avoid a DB write, we need to keep track of the current state of the container. The real killer is probably that the DB is trying to maintain transactional integrity by retaining the old record until the write of the new container state has succeeded - as such, when disk space available is 0, it will fail to update because even if the new state consumes no more space the old state info is retained by the DB until the write succeeds - and I really don't want to turn that off because transactional safety saves us in dozens of other places, even if it hurts us here.

@martinetd
Copy link
Author

I definitely don't want to throw db integrity here: we can do exactly the same as we're doing with the db after removing other files, odds are we'll have removed something and it'll work.
If it didn't, that doesn't really matter that much either does it? podman ps -a will think there's a stopped container somewhere where in fact its data has already been removed; as long as we're not actually corrupting the db in almost-ENOSPC scenarii I think it's better than the current state where I had to tell our user to grow the fs temporarily...
And they can just remove it again after removing another container that did save space, it might pop in some warning the files already are missing but it should allow cleaning up the container in db as it's gone.

That's not necessarily true for other actions, e.g. start should be done as currently: creating in db then creating the container on the filesystem, but I see rm as the opposite so poping "entries" in reverse order makes sense to me.

If you don't think it does (we might have different breakage models in mind), putting it in a specific podman rm --unsafe-full-fs might be possible?
Alternatively, if you can think of any other remotely easy way of freeing just a few KB safely so the db operation can go through next, I also agree that'd probably be easier than whatever I'm suggesting here. I just couldn't come up with anything I like.

(Fun fact, for e.g. image rm the error message clearly states writing to a temporary json file failing so they're doing write somewhere else + rename, but for bolt_state.db it was a write directly in the db. I assume they (db format) have their own integrity mechanism in there as it didn't seem to break anything, but that'd probably deserve testing or confirming there are tests for this... But let's ignore that for now and assume they do)

@github-actions
Copy link

github-actions bot commented Jun 3, 2022

A friendly reminder that this issue had no activity for 30 days.

@martinetd
Copy link
Author

I'm still interested in solving this, but I still don't have a better idea than making a few specific commands ENOSPC-safe as I'm not comfortable automating growing the podman partition for our product. (and I appreciate it's easier said than done...)

@github-actions
Copy link

github-actions bot commented Jul 4, 2022

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 5, 2022

@martinetd Any movement on this?

@martinetd
Copy link
Author

Right now we still tell any customer stuck on this to temporarily extend the filesystem if that happens; it's been rare enough that I haven't automated anything yet.

Honestly I fully understand that it is hard to make commands not write anything (e.g. ps) or free up data before updating db (at least for rm, I agree we wouldn't want to do that for image rm), so if this become more of a problem reserving e.g. 1MB in a file that can easily be removed would likely be my ""solution"" at this point.

I stand by my last comment that it should be possible to have podman ps/kill/rm work in a full fs context but I'll be a bad greedy user here -- honestly don't think I can spare time to work on this in the forseeable future.

@github-actions
Copy link

github-actions bot commented Aug 5, 2022

A friendly reminder that this issue had no activity for 30 days.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Sep 14, 2022

@mheon If we created a dummy file of a couple of megabytes when storage is first created, and then removed it first when attempting to do a system reset, do you think that could fix this problem?

@mheon
Copy link
Member

mheon commented Sep 14, 2022

Probably, assuming that whatever ate all the storage on the system doesn't eat the space we free first.

@rhatdan
Copy link
Member

rhatdan commented Sep 14, 2022

Well since the most likely case is containers/storage used up the space, we should work fine in that situation.
Now the question is how much space do we need to allocate?

@mheon
Copy link
Member

mheon commented Sep 14, 2022

I would say: make a Libpod database with 100 containers, 100 pods, 200 volumes - all pretty sizable numbers - and find out what the size of it is. Then double that. That should cover typical cases, I think...

@chilikk
Copy link

chilikk commented Jan 25, 2023

I am seeing a similar issue (as reported in #17198), and after following this thread I got an impression that the podman rm error when filesystem full boils down to a database update that requires additional space. So, in order to try to mitigate this issue I have mounted the /var/lib/containers/storage/libpod on a separate filesystem to avoid the database competing for the space with the container storage. However, even with database mounted separately I am seeing the same error:

Error: removing container 1218e629099e430c8c411d6094428e0cbb7baa13ad52fd5b42b5ca00f5182c6e root filesystem: 2 errors occurred:
        * open /var/lib/containers/storage/overlay-containers/.tmp-containers.json2754569372: no space left on device
        * open /var/lib/containers/storage/overlay-layers/.tmp-layers.json3056519018: no space left on device

In fact, in both scenarios (database located on the same and on the separate filesystem) I do see the container gone from the podman ps list despite podman rm error, which suggests (correct me if I am wrong) that the database update goes through in both cases. This might be pure luck in the first case though if there is enough free space in the already allocated filesystem block to perform the change. Anyway, the container removed from the podman ps list further complicates the cleanup as I am no longer able to inspect the container in order to manually remove its storage, do you have any suggestions on how to better handle this case?

@martinetd
Copy link
Author

hmm, I recall podman stopped on first error, but in the log you gave it lists both json files so it could very well have gone ahead with the bolt.db as well.. I assume you request stopped containers as well with -a e.g. podman ps -a ? (in which case, inspect should also work)

I didn't re-read the whole thread but I'm fairly sure I described both the libpod/bolt.db and the json files as metadata, and just suggested for podman rm to remove the actual data (e.g. layer files) before updating all of that -- as the worst that could happen if the command is interrupted is a half-deleted layer that the user wanted to delete anyway, so finishing that cleanup e.g. running command again or cleanup after reboot would finish the job.
It's quite possible I conflated all of that as 'db' even if there are files mixed in, I don't think it's possible to have a separate mount for all the metadatas... The next best thing you could do is keep images in another podman root used as additionalimagestore -- as that won't be touched, you can just nuke the read-write part out if the problem happens.

Anyway, if ps -a doesn't work, your next best bet is to go throught the containers.json and layers.json manually and compare with respectively running containers and images... I don't think podman has a 'gc' command yet. Exporting all images and starting over might be the easiest if you don't want to deal with that.

@chilikk
Copy link

chilikk commented Jan 25, 2023

I did do podman ps -a and podman inspect, sorry.

From a layman's point of view I agree with you about deleting the container and layer data before updating the metadata and database - it sounds better to end up in an inconsistent state where the container and layer data is gone (which is expected after podman rm) but the metadata and/or database entry remains, rather than the current behaviour where the database record is deleted, but the data remains complicating the clean up.

Another idea: judging by the filename in the error message (.tmp-containers.json2754569372) I am guessing that Podman is trying to perform an atomic update of the containters.json by writing an updated copy to the .tmp-containers.json-file and then replacing the containers.json by the updated copy. If every update of container.json and layers.json is performed atomically in this fashion, and assuming that this operation can be serialized to avoid race conditions (this is likely already the case), Podman could pre-allocate a new .tmp-containers.json (and .tmp-layers.json) with a predictable name immediately after each write instead of immediately before the next write, therefore reducing the probability of running out of space the next time the metadata needs to be updated - at least for the operations that reduce the size of the metadata (such as rm) as opposed to increasing it (in which case the pre-allocated space could be insufficient).

UPD: my previous suggestion about pre-allocating with open+seek would obviously create a sparse file not actually allocating the space, so one would probably need to actually write the corresponding number of bytes in order to pre-allocate, which is of course somewhat more expensive.

@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2023

Interested in opening a PR?

@baude
Copy link
Member

baude commented Jan 25, 2023

@mheon @rhatdan wwyt about a system reset --unsafe instead of the file idea ... this would use filesystem deletion instead of deletion via apis?

@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2023

I think this should be the default, in that case, since podamn system reset, should just clobber those directories.
The issue is we feel early in the podman command before we can even start the reset.
Perhaps we could ignore the failure from database if podman system reset.

chilikk added a commit to chilikk/storage that referenced this issue Jan 25, 2023
…3967

Keeping a temporary file of at least the same size as the target file
for atomic writes helps reduce the probability of running out of space
when deleting entities from corresponding metadata in a disk full
scenario. This strategy is applied to writing containers.json,
layers.json, images.json and mountpoints.json.

Signed-off-by: Denys Knertser <denys@netze.io>
chilikk added a commit to chilikk/storage that referenced this issue Jan 25, 2023
…3967

Keeping a temporary file of at least the same size as the target file
for atomic writes helps reduce the probability of running out of space
when deleting entities from corresponding metadata in a disk full
scenario. This strategy is applied to writing containers.json,
layers.json, images.json and mountpoints.json.

Signed-off-by: Denys Knertser <denys@netze.io>
@chilikk
Copy link

chilikk commented Jan 25, 2023

Interested in opening a PR?

Seems that the solution I suggested belongs to another repo, I have created containers/storage#1480

chilikk added a commit to chilikk/storage that referenced this issue Jan 25, 2023
…3967

Keeping a temporary file of at least the same size as the target file
for atomic writes helps reduce the probability of running out of space
when deleting entities from corresponding metadata in a disk full
scenario. This strategy is applied to writing containers.json,
layers.json, images.json and mountpoints.json.

Signed-off-by: Denys Knertser <denys@netze.io>
@stevenschlansker
Copy link

This seems to affect podman system prune as well, making it quite tricky to resolve an out of space issue on a podman-machine VM:

% podman system prune
WARNING! This command removes:
	- all stopped containers
	- all networks not used by at least one container
	- all dangling images
	- all dangling build cache

Are you sure you want to continue? [y/N] y
Error: open /var/home/core/.local/share/containers/storage/overlay-images/.tmp-images.json2391088807: no space left on device

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

8 participants