Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman cannot start containers using 36.20220906.3.2, but can with 36.20220820.3.0 #1305

Closed
ibotty opened this issue Sep 26, 2022 · 30 comments
Closed
Assignees
Labels

Comments

@ibotty
Copy link

ibotty commented Sep 26, 2022

Describe the bug
I am using an overlayed quadlet to generate systemd units, but I cannot start the container with podman alone.

Reproduction steps
Steps to reproduce the behavior:

  1. let zincati update to 36.20220906.3.2 and reboot
  2. observe failed container starts

Expected behavior

Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de systemd[1]: Starting mariadb.service - MariaDB Container...
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de podman[2165]:
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de podman[2165]: 2022-09-26 17:41:14.709916023 +0000 UTC m=+0.216134044 container create 9e2e9
a5443e5b7fc56c34b0d70b6bbdfea247b116cf4ea8bdde16ca709d3198a (image=docker.io/library/mariadb:latest, name=systemd-mariadb, health_status=, PODMAN_SYSTE
MD_UNIT=mariadb.service, io.containers.autoupdate=registry)
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de podman[2165]: 2022-09-26 17:41:14.602781772 +0000 UTC m=+0.108999723 image pull  docker.io/
mariadb:latest
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de mariadb[2165]: time="2022-09-26T17:41:14Z" level=error msg="Unmounting /var/lib/containers/
storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/merged: invalid argument"
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de podman[2165]: 2022-09-26 17:41:14.796106456 +0000 UTC m=+0.302324404 container remove 9e2e9
a5443e5b7fc56c34b0d70b6bbdfea247b116cf4ea8bdde16ca709d3198a (image=docker.io/library/mariadb:latest, name=systemd-mariadb, health_status=, PODMAN_SYSTE
MD_UNIT=mariadb.service, io.containers.autoupdate=registry)
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de mariadb[2165]: Error: error mounting storage for container 9e2e9a5443e5b7fc56c34b0d70b6bbdf
ea247b116cf4ea8bdde16ca709d3198a: creating overlay mount to /var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d7
54d7be805/merged, mount_data="lowerdir=/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/
RYHLKR5WWLQV5YJQGJSAOQ6IWF:/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/diff1:/var/l
ib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/PYAWWXMGWN6267Q33SZQ62FS7G:/var/lib/container
s/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/BQNBIMQ3FA5YNTORK7ECQS3RWJ:/var/lib/containers/storage/ov
erlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/A4ZHWROOLIANPVE7ZEFYZEBQN4:/var/lib/containers/storage/overlay/b7ff9b
782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/5HZJNJCGLHUNZC3FA25DVEJ4KQ:/var/lib/containers/storage/overlay/b7ff9b782e48b7275d
bca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/UISWHBFQF4ZZZBX4EPYEQI2ZQP:/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3
aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/NAYEVTJWISNR6KVJ2WORAP2USZ:/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb4
0ad1990e2d1d754d7be805/mapped/0/l/E5AC5W43VF3HYRM5XASPWPNDXG:/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d
754d7be805/mapped/0/l/YKFLN7AP2SZU6GNISRPME3SERM:/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/m
apped/0/l/YY7B5JWHDJO4HBKB6TGG6TBROL:/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/HX
JHQBW6ZENAHVCFVOH2FADAGT:/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/mapped/0/l/MZXG3ISL2KBAHY
WJNL3AJZ7YGE,upperdir=/var/lib/containers/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/diff,workdir=/var/lib/contai
ners/storage/overlay/b7ff9b782e48b7275dbca447e640f3aec16d066cb40ad1990e2d1d754d7be805/work,nodev,metacopy=on,volatile,context=\"system_u:object_r:conta
iner_file_t:s0:c217,c355\"": no such file or directory
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de systemd[1]: mariadb.service: Main process exited, code=exited, status=127/n/a
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de systemd[1]: mariadb.service: Failed with result 'exit-code'.
Sep 26 17:41:14 static.138.222.12.49.clients.your-server.de systemd[1]: Failed to start mariadb.service - MariaDB Container.

dmesg shows (maybe the id is wrong):

kernel: overlayfs: failed to resolve '/var/lib/containers/storage/overlay/e8f9cc5fd07d69f331823697c3e1196bc382074e514ebd4f72ccb8ec3b2acae5/mapped/0/l/diff1': -2

Note, that also removing all images (podman images -q|xargs podman rmi) does not resolve the situation.

The systemd file starting mariadb is the following.

# /run/systemd/generator/mariadb.service
# Automatically generated by quadlet-generator
[Unit]
After=podman-secret@mariadb-root-password.service podman-secret@mariadb-user-password.service
Wants=podman-secret@mariadb-root-password.service podman-secret@mariadb-user-password.service
Description=MariaDB Container
RequiresMountsFor=%t/containers
SourcePath=/etc/containers/systemd/mariadb.container

[X-Container]
Image=docker.io/mariadb:latest
#PublishPort=3306:3306
User=999
#RunInit=no
Environment=MARIADB_DATABASE=wordpress
Environment=MARIADB_AUTO_UPGRADE=1
Environment=MARIADB_MYSQL_LOCALHOST_USER=1
Environment=MARIADB_MYSQL_LOCALHOST_GRANTS="RELOAD, PROCESS, LOCK TABLES, BINLOG MONITOR"
Environment=MARIADB_ROOT_PASSWORD_FILE=/run/secrets/mariadb-root-password
Environment=MARIADB_USER=wordpress
Environment=MARIADB_PASSWORD_FILE=/run/secrets/mariadb-user-password
Volume=mariadb-data:/var/lib/mysql
Volume=mariadb-socket:/var/run/mysqld
PodmanArgs=--secret mariadb-root-password --secret mariadb-user-password
Label=io.containers.autoupdate=registry

[Service]
Restart=always
Environment=PODMAN_SYSTEMD_UNIT=%n
KillMode=mixed
ExecStartPre=-rm -f %t/%N.cid
ExecStopPost=-/usr/bin/podman rm -f -i --cidfile=%t/%N.cid
ExecStopPost=-rm -f %t/%N.cid
Delegate=yes
Type=notify
NotifyAccess=all
SyslogIdentifier=%N
ExecStart=/usr/bin/podman run --name=systemd-%N --cidfile=%t/%N.cid --replace --rm -d --log-driver journald --pull=never --runtime /usr/bin/crun --cgr>

[Install]
WantedBy=multi-user.target
@dustymabe
Copy link
Member

How do you overlay quadlet? Can you share your butane config?

@ibotty
Copy link
Author

ibotty commented Sep 27, 2022

I overlayed quadlet manually after installation with sudo rpm-ostree install --reboot quadlet, because at that time the extensions key was not working (afaict).

An edited butane config is below. It is missing many containers and some systemd timers, but this one should be mostly self-contained.

variant: fcos
version: 1.4.0
passwd:
  users:
  - name: core
    ssh_authorized_keys:
    - [...]

systemd:
  units:
  - name: podman-auto-update.timer
    enabled: true
  - name: podman-secret@.service
    contents: |
      [Unit]
      Description=Create generate podman secret %i
      ConditionPathExists=!/var/lib/%N.stamp

      [Service]
      Type=oneshot
      RemainAfterExit=yes
      ExecStart=/bin/bash -c 'pwmake 256 | podman secret create %i -'
      ExecStart=/bin/touch /var/lib/%N.stamp

      [Install]
      WantedBy=multi-user.target
storage:
  files:
  - path: /etc/containers/systemd/mariadb.container
    mode: 0644
    contents:
      inline: |
        [Unit]
        After=podman-secret@mariadb-root-password.service podman-secret@mariadb-user-password.service
        Wants=podman-secret@mariadb-root-password.service podman-secret@mariadb-user-password.service
        Description=MariaDB Container

        [Container]
        Image=docker.io/mariadb:latest
        User=999
        Environment=MARIADB_DATABASE=wordpress
        Environment=MARIADB_AUTO_UPGRADE=1
        Environment=MARIADB_MYSQL_LOCALHOST_USER=1
        Environment=MARIADB_MYSQL_LOCALHOST_GRANTS="RELOAD, PROCESS, LOCK TABLES, BINLOG MONITOR"
        Environment=MARIADB_ROOT_PASSWORD_FILE=/run/secrets/mariadb-root-password
        Environment=MARIADB_USER=wordpress
        Environment=MARIADB_PASSWORD_FILE=/run/secrets/mariadb-user-password
        Volume=mariadb-data:/var/lib/mysql
        Volume=mariadb-socket:/var/run/mysqld
        PodmanArgs=--secret mariadb-root-password --secret mariadb-user-password
        Label=io.containers.autoupdate=registry

        [Service]
        Restart=always

        [Install]
        WantedBy=multi-user.target
  - path: /etc/sysctl.d/20-silence-audit.conf
    mode: 0644
    contents:
      inline: |
        # Raise console message logging level from DEBUG (7) to WARNING (4)
        # to hide audit messages from the interactive console
        kernel.printk=4

My rpm-ostree status is

State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Tue 2022-09-27 06:57:09 UTC)
Deployments:
* fedora:fedora/x86_64/coreos/stable
                  Version: 36.20220820.3.0 (2022-09-06T18:19:33Z)
               BaseCommit: a465c49fef185f8339d3cd5857e28386cfdc6516f68206912917c9dc3192d809
             GPGSignature: Valid signature by 53DED2CB922D8B8D9E63FD18999F7CBF38AB71F4
          LayeredPackages: quadlet

  fedora:fedora/x86_64/coreos/stable
                  Version: 36.20220906.3.2 (2022-09-22T13:44:19Z)
               BaseCommit: a8b4dda3092f20335cd3270db131f782edf5cad8b11b927283b3da2af42463e6
             GPGSignature: Valid signature by 53DED2CB922D8B8D9E63FD18999F7CBF38AB71F4
          LayeredPackages: quadlet

@mike-nguyen
Copy link
Member

mike-nguyen commented Sep 29, 2022

I was able to reproduce with the butane configuration provided. One thing to note is that you must run sudo podman pull docker.io/mariadb:latest and then systemctl daemon-reload.

This doesn't seem to be FCOS specific as I was able to reproduce on Fedora 36. I've isolated the issue down to kernel bump from 5.18 to 5.19. Using FCOS 36.20220820.3.0 and overriding the kernel to 5.19.6-200.fc36.x86_64 the error appears. If I rollback to the original kernel 5.18.18-200.fc36.x86_64 the error goes away. I tried a few different kernels and the error only occurs on the first jump 5.19--all 5.18 kernels work. It even fails on a fresh install if it has the affected kernel.

Here is the same error from Fedora 36:

[root@fedora ~]# dmesg | grep overlay
[    4.629035] overlayfs: POSIX ACLs are not yet supported with idmapped layers, mounting without ACL support.
[    4.645771] overlayfs: failed to resolve '/var/lib/containers/storage/overlay/ef2816233d22fe237657b0ba7d4730bcc4788d41109a4c30d22dde0300fff1d3/mapped/0/l/diff1': -2
[    5.065732] overlayfs: failed to resolve '/var/lib/containers/storage/overlay/77d6814268cb6cd5b1b0a9e9dec71311f202b104cb16ae2dc866bc686d7aa405/mapped/0/l/diff1': -2
[    5.632945] overlayfs: failed to resolve '/var/lib/containers/storage/overlay/7e6ddb05c2694bd4beebd8a79899f4c49d9c08fd607b1624f29d9d92bfe51a5c/mapped/0/l/diff1': -2
[    6.227341] overlayfs: failed to resolve '/var/lib/containers/storage/overlay/3ced793161b60d96b3e96f4669d43b2c8e698908bd70a3b9dbcf470ed544f02f/mapped/0/l/diff1': -2
[    6.771601] overlayfs: failed to resolve '/var/lib/containers/storage/overlay/56c39247f40147ee58c4100de66050d92a6dd97ca79c4d17602a40edaf740a96/mapped/0/l/diff1': -2
[root@fedora ~]# uname -a
Linux fedora 5.19.11-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 23 15:07:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
[root@fedora ~]# cat /etc/os-release 
NAME="Fedora Linux"
VERSION="36 (Server Edition)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Server Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Server Edition"
VARIANT_ID=server
[root@fedora ~]# rpm -q quadlet podman 
quadlet-0.2.4-1.fc36.x86_64
podman-4.2.0-2.fc36.x86_64

@mike-nguyen
Copy link
Member

@mheon @alexlarsson can you take a look at this from the podman / quadlet side of things? It looks like something in the 5.19 kernel is causing issues with containers run with quadlet / podman. Kernel 5.18 works with no issues.

We are seeing overlay error messages and the containers are not starting.

[    5.684777] overlayfs: failed to resolve '/var/lib/containers/storage/overlay/54b1e2226dc84ea9fcade81b518828f0e9987952fdb1c76a8a8b0d8eaf598399/mapped/0/l/diff1': -2

Reproducer on Fedora 36 with kernel 5.19 or later:

# dnf install -y podman quadlet
# cat << EOF > /etc/systemd/system/podman-secret@.service 
[Unit]
Description=Create generate podman secret %i
ConditionPathExists=!/var/lib/%N.stamp

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c 'pwmake 256 | podman secret create %i -'
ExecStart=/bin/touch /var/lib/%N.stamp

[Install]
WantedBy=multi-user.target
EOF

# mkdir -p /etc/containers/systemd
# cat << EOF > /etc/containers/systemd/mariadb.container 
[Unit]
After=podman-secret@mariadb-root-password.service podman-secret@mariadb-user-password.service
Wants=podman-secret@mariadb-root-password.service podman-secret@mariadb-user-password.service
Description=MariaDB Container

[Container]
Image=docker.io/mariadb:latest
User=999
Environment=MARIADB_DATABASE=wordpress
Environment=MARIADB_AUTO_UPGRADE=1
Environment=MARIADB_MYSQL_LOCALHOST_USER=1
Environment=MARIADB_MYSQL_LOCALHOST_GRANTS="RELOAD, PROCESS, LOCK TABLES, BINLOG MONITOR"
Environment=MARIADB_ROOT_PASSWORD_FILE=/run/secrets/mariadb-root-password
Environment=MARIADB_USER=wordpress
Environment=MARIADB_PASSWORD_FILE=/run/secrets/mariadb-user-password
Volume=mariadb-data:/var/lib/mysql
Volume=mariadb-socket:/var/run/mysqld
PodmanArgs=--secret mariadb-root-password --secret mariadb-user-password
Label=io.containers.autoupdate=registry

[Service]
Restart=always

[Install]
WantedBy=multi-user.target
EOF

# podman pull docker.io/mariadb:latest
# systemctl daemon-reload
# dmesg | grep overlay

@mheon
Copy link

mheon commented Sep 29, 2022

Never seen this before, and 5.19 has been out for a long enough time that I suspect we would've heard about serious incompatibilities already in Podman upstream (I'm running the same kernel minor version right now without issues, albeit on Fedora not FCOS). Still, that error does seem to be coming straight out of c/storage's overlayfs code. @nalind Any thoughts?

@dustymabe
Copy link
Member

In #1305 (comment) @mike-nguyen shows a reproducer on a Fedora VM (not FCOS). So if you have one of those handy you should be able to try it yourself.

@ibotty
Copy link
Author

ibotty commented Sep 30, 2022

Thanks for looking into it. Note, that it can be reproduces with podman alone using a systemd-unit as above. Quadlet is merely generating these files.

@cgwalters
Copy link
Member

albeit on Fedora not FCOS

Fedora CoreOS is Fedora. I will continue to fight against this terminology. Alternative terms are e.g. "dnf-based Fedora" or "traditional Fedora" or so.

@cgwalters
Copy link
Member

Bigger picture, a problem we have now is too many of our tests are highly synthetic - we have an upgrade test, but that upgrade test only validates we can get from one version of the OS to another, it doesn't verify containers still work. This is a big gap. (Also true of podman's upstream CI as far as I know)

@mheon
Copy link

mheon commented Sep 30, 2022

We do have upgrade testing, but entirely within Podman versions. We do no validation as to whether OS or kernel updates break us in our upstream CI (honestly, this hasn't come up often)

@lukasmrtvy
Copy link

See containers/storage#1308, maybe its related.

@PhrozenByte
Copy link

PhrozenByte commented Oct 4, 2022

See containers/storage#1308, maybe its related.

containers/storage#1308 seems to be about running multiple containers using the same image, but I don't do that and still encounter this issue with 36.20220918.3.0 (as with 36.20220906.3.2 before). The error message is the same as in containers/storage#1308 though and slightly different from @ibotty's:

[    9.110800] overlayfs: POSIX ACLs are not yet supported with idmapped layers, mounting without ACL support.
[    9.435100] overlayfs: failed to resolve '/var/lib/containers/storage/overlay/f2761b92efa71f4dcd11378fc58891b5fbc4b4373cac4f0eedfee2902f4c5b50/mapped/0/l/HOTNAYERIZLT6DLBLTTQHX7T2V': -13

Affects rootful containers only, rootless containers work just fine. Anyway, it still might be the same issue, just a different trigger. Too bad we don't have the patch in FCOS yet. Just for the record, the issue isn't related to quadlet for sure: I neither use quadlet, nor any other layered package. Since we already have a reproducer, I refrain from providing a reproducer myself.

Once again rolling back to 36.20220820.3.0 and disabling Zincati for now... 😒

@PhrozenByte
Copy link

Any updates on this? Still not working with 37.20221211.3.0, even though containers/storage#1308 should be included by now?

@dustymabe
Copy link
Member

@giuseppe - any ideas what could be the underlying issue here?

@giuseppe
Copy link

giuseppe commented Jan 9, 2023

could you try running mv /run/containers/storage/overlay/idmapped-lower-dir-true /run/containers/storage/overlay/idmapped-lower-dir-false then try again to run the container? So we force podman to not use idmapped mounts

@PhrozenByte
Copy link

PhrozenByte commented Jan 10, 2023

could you try running mv /run/containers/storage/overlay/idmapped-lower-dir-true /run/containers/storage/overlay/idmapped-lower-dir-false then try again to run the container? So we force podman to not use idmapped mounts

With idmapped mounts disabled all containers spin up just fine on FCOS 37.20221211.3.0 👍

Thank you @giuseppe so far 👍 This should allow me to finally re-enable upgrades (it was disabled waaaay to long...). Is there a way to make this config persistent?


Since you asked this elsewhere @giuseppe, all my containers are stored on btrfs filesystems (one subvolume per container to be more precise). This probably is the only major difference from "regular" FCOS (at least as far as I can think of right now...); I neither use layered packages, nor do I modify anything in /etc/containers (or ~/.config/containers for rootless containers). Only rootful containers are affected, rootless containers work just fine.

Here's one of the affected Systemd services:
[Unit]
Description=Podman container 'bind'
Wants=network-online.target container-network-bind.service
After=network-online.target container-network-bind.service
RequiresMountsFor=%t/containers
RequiresMountsFor=/srv/containers/bind

[Service]
Type=notify
NotifyAccess=all
Environment=PODMAN_SYSTEMD_UNIT=%n
ExecStartPre=/bin/rm -f %t/%n.ctr-id
ExecStart=/usr/bin/podman run --cidfile=%t/%n.ctr-id --sdnotify=conmon --cgroups=no-conmon --replace -dt --name bind --label io.containers.autoupdate=registry --subuidname bind --uidmap 65536:100000007:1 --uidmap 65537:100000002:1 --subgidname bind --gidmap 65536:100000007:1 --gidmap 65537:100000002:1 --mount type=bind,src=/srv/containers/bind/config/local-zones,dst=/etc/named/local-zones,ro=true --mount type=bind,src=/srv/containers/acme/data/live/dot.example.com,dst=/etc/named/ssl/dns-over-tls,ro=true --mount type=bind,src=/srv/containers/bind/config/ssl/dhparams.pem,dst=/etc/named/ssl/dhparams.pem,ro=true --mount type=bind,src=/srv/containers/bind/data,dst=/var/named --net bind --hostname ns.example.com -p 192.0.2.1:53:53/tcp -p 192.0.2.1:53:53/udp -p 192.0.2.1:853:853/tcp -p [2001:db8::1]:53:53/udp -p [2001:db8::1]:53:53/tcp -p [2001:db8::1]:853:853/tcp ghcr.io/sgsgermany/bind:latest
ExecStop=/usr/bin/podman stop --ignore --cidfile=%t/%n.ctr-id
TimeoutStopSec=70
Restart=on-failure

[Install]
WantedBy=default.target

I've masked the hostname, global IPv4 and IPv6.

The container's sources can be found here: https://github.com/SGSGermany/bind

My subuid scheme looks like the following:
  • Container UIDs 0-65535 are mapped to host UIDs 100700000-100765535 (using --subuidname bind, see /etc/subuid below)
  • Container UID 65536 is mapped to the host UID 100000007 (for rootless containers this would also be the user running the container, but naturally not for a rootful container; using --uidmap 65536:100000007:1)
  • Following container UIDs (65537+) are mapped to various purposeful host UIDs (65537 is mapped to 100000002 in this example; using --uidmap 65537:100000002:1)
And the corresponding entries in /etc/subuid and /etc/subgid look like the following:
bind:100700000:65536

If you need any more info, please let me know 👍

@cgwalters cgwalters pinned this issue Jan 10, 2023
@PhrozenByte
Copy link

I'm no big fan of pinging issues, but since it's a major issue for my systems:

If fixing this will take some time, I'd like to ask again how to persist @giuseppe's workaround? I don't see any difference with idmapped mounts disabled, so making the workaround persistent is totally fine for now and would allow me to upgrade FCOS. Unfortunately I didn't find any config option, nor anything online, or in the man pages. Thanks! 👍

mv /run/containers/storage/overlay/idmapped-lower-dir-true /run/containers/storage/overlay/idmapped-lower-dir-false

@giuseppe
Copy link

There is no way to make it persistent. It is lost on reboots.

Maybe you could temporarily use a systemd oneshot service to create it?

@PhrozenByte
Copy link

That's unfortunate, but yeah, good idea, a systemd oneshot service should do until this is fixed. Thanks @giuseppe, looking forward to an actual fix 👍 Should we create a new issue in containers/storage (or somewhere else?) to better track this issue? Or can we somehow move this issue there?

Just for the record and for others with the same issue, here's the Systemd service I came up with. I was having a hard time figuring out how to trigger the creation of /run/containers/storage/overlay without trying to run a container that will fail to start anyway and ended up simply creating the file structure myself. Thankfully Podman respects this pre-existing file structure.

[Unit]
Description=Disable idmapped overlayfs mounts of Podman containers (bugfix)
Before=container-bind.service
RequiresMountsFor=%t/containers

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'if [ -f /run/containers/storage/overlay/idmapped-lower-dir-true ]; then mv /run/containers/storage/overlay/idmapped-lower-dir-true /run/containers/storage/overlay/idmapped-lower-dir-false; else if [ ! -d /run/containers/storage/overlay ]; then mkdir -p -m 700 /run/containers/storage/overlay; fi; touch /run/containers/storage/overlay/idmapped-lower-dir-false; fi'
RemainAfterExit=true

[Install]
RequiredBy=container-bind.service

@dustymabe
Copy link
Member

Should we create a new issue in containers/storage (or somewhere else?) to better track this issue?

I'm interested in what the actual problem is here and what the fix would be too. @giuseppe can you help guide us on that front?

@lukasmrtvy
Copy link

@dustymabe I am trying Fedora CoreOS 37.20221211.3.0 with Podman 4.4.0-dev, but still getting errors similar to containers/storage#1308 (comment) ,

From today:

Jan 18 22:58:14 ip-10-1-94-157 podman[1084]: time="2023-01-18T22:58:14Z" level=error msg="Unmounting /var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/merged: invalid argument"
Jan 18 22:58:14 ip-10-1-94-157 podman[1084]: time="2023-01-18T22:58:14Z" level=info msg="Request Failed(Conflict): preparing container 34414fbb5c439e0db40e1a5c068acb48ab2b5c29e9050a5feaeb3db80146218a for attach: mounting storage for container 34414fbb5c439e0db40e1a5c068acb48ab2b5c29e9050a5feaeb3db80146218a: creating overlay mount to /var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/merged, mount_data=\"lowerdir=/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/WUXOKHJUOPWJWK3BCZUSDIFWN4:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/diff1:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/IVE3DK32Q3PTXFEZ5XCBLKDVLM:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/IOXXSGV6RGSQE3VSQRQ3VT5N6P:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/HIQQV3SHYSJMVVBWMBDGHZJTG7:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/R3SCWQDEKP5RZDRSZTLOG3524D:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/2NTMRE74EULTEMQRC77B4OT67N:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/VRXVHRZLZF5OEVV5Q34BN55H2H:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/RECTHXXK3O23B5Y2YRXGT6YU2D:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/GS5RHTJ6CDYWHITCH7X4GHBJPF:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/ZRZ23KK4HEDOGK3IB2IRHHRHUT:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/BEU646LC7T32NVAJXBHOFWGNY3:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/AOJG6PLJZS7INSWPTKHRSZPRRE:/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/mapped/0/l/DFPYDIA5VSW3YVJOJ3UY3R5SEZ,upperdir=/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/diff,workdir=/var/lib/containers/storage/overlay/62428e0039a780c6b0a816a5e1543033e15ba1b619a903eaa2e962845162e9f2/work,nodev,metacopy=on,context=\\\"system_u:object_r:container_file_t:s0:c564,c896\\\"\": permission denied"

@giuseppe is it related or what?

@lukasmrtvy
Copy link

@dustymabe filled an issue containers/podman#17171, not sure if its related tho

@ykuksenko
Copy link

I am also seeing this on podman 4.4.1/ Kernel 6.1.13-100.fc36.x86_64 / Fedora 36 along with the same dmesg output.

  • Affects just one container/image of many for me (running as root with --subuidname and --subgidname on btrfs)
  • Workaround is functional until the system is rebooted then I have to reapply.
  • Removing and re-fetching the image and container did not help.

@PhrozenByte
Copy link

Affects just one container/image of many for me (running as root with --subuidname and --subgidname on btrfs)

Replacing --subuidname and --subgidname by --uidmap resp. --gidmap didn't make a difference for me (FCOS 37.20230205.3.0 with kernel 6.1.9-200.fc37.x86_64 and podman 4.3.1). Besides running on btrfs, can you provide the image and systemd unit for reference to spot other similarities?

@ykuksenko I've written a small Systemd service to persists the workaround, see #1305 (comment), you just have to add the container's Systemd service to the RequiredBy directive and enable it using systemctl enable.

@ykuksenko
Copy link

ykuksenko commented Feb 27, 2023

  1. I found two more workarounds!
  • Rebuilding the container image and redeploying on the newer image works. I have this in a VM so the test was trivial.
  • Export the image, delete the existing one and import again then run the container. (Make sure you have a backup if the data is important!) edit: Keep track of your container image tags. They are lost using this process.
IMAGE=$(podman image ls |grep speedtest |awk '{print $3}')
podman image save $IMAGE > speedtest.tar
podman image rm $IMAGE
cat speedtest.tar | podman image load
/usr/bin/podman  run --rm -ti --name=speedtest-exporter --subuidname=speedtest-exporter --subgidname=speedtest-exporter  my.internal.registry/path/speedtest-exporter:alpine_latest
  1. The image I use is custom but based on alpine from about 2 months ago. I can share more details if you want them.
3. here is my systemd service unit # /etc/systemd/system/container-speedtest-exporter.service # container-speedtest-exporter.service

[Unit]
Description=Podman container-speedtest-exporter.service
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=no
TimeoutStopSec=70
ExecStartPre=/bin/rm
-f %t/%n.ctr-id
ExecStart=/usr/bin/podman run
--cidfile=%t/%n.ctr-id
--cgroups=no-conmon
--rm
--sdnotify=conmon
-d
--replace
--name=speedtest-exporter
--subuidname=speedtest-exporter
--subgidname=speedtest-exporter
--network=prometheus,traefik
-p 9469:9469/tcp
--memory=32m
--memory-swap=32m
--dns-search=. my.internal.registry/path/speedtest-exporter:alpine_latest
ExecStop=/usr/bin/podman stop
--ignore -t 10
--cidfile=%t/%n.ctr-id
ExecStopPost=/usr/bin/podman rm
-f
--ignore -t 10
--cidfile=%t/%n.ctr-id
Type=notify
NotifyAccess=all

[Install]
WantedBy=default.target

  1. Reducing the command to a direct simplified podman command also fails for me:
  • /usr/bin/podman run --rm -ti --name=speedtest-exporter --subuidname=speedtest-exporter --subgidname=speedtest-exporter my.internal.registry/path/speedtest-exporter:alpine_latest
  1. Removing uid mapping works without the original workaround
  • /usr/bin/podman run --rm -ti --name=speedtest-exporter my.internal.registry/path/speedtest-exporter:alpine_latest
  1. The subuid and subgid files were populated with the useradd commands. afaict these do not have any overlaps or odditities.
  • both files contain the same entry speedtest-exporter:2393760:65536
  • using other subuids that function correctly with other image with this container has the same issue
  1. I am pretty sure I have seen this issue a long time ago after doing an OS upgrade - iirc it was pre podman 4. Unfortunately I did not save my logs from then and did not file an issue. I just reinstalled my vm in ~May 2022, while dealing with other podman issues back then.

  2. I do not know when this issue started for me this time. It was broken as far as my logs go back on this system. Luckily this was an unimportant container for me.

I am not sure but based on the export/reimport workaround it seems the issue has something to do with on disk image configuration. I am not sure how to check that though. I will keep this container in the broken state for a while in case there are other questions.

edit: fixed numbering

@ykuksenko
Copy link

I upgraded another system from Fedora 34 to Fedora 36, skipping 35, and had the same issue happen there too. Only 1 of 2 containers were affected there. The dmesg output is slightly different - namely the code at the end.
[36461.688245] overlayfs: failed to resolve '/var/lib/containers/storage/overlay/9854266a76d166df7558b5ffb7157d4bd30a7c429c7eb7d87e2aa6395b535503/mapped/1/l/WNHAEEBWFUAUWL4NGAZU6VZJNF': -13

On that digital ocean system:

  • The kernel was upgraded from kernel-core.x86_64 5.15.6-100.fc34 to kernel-core.x86_64 6.1.13-100.fc36
  • Podman was upgraded from podman.x86_64 3:3.4.7-1.fc34 to podman.x86_64 5:4.4.1-3.fc36
  • It is using ext4 as the base file system.
  • The custom container was based on Fedora36.

Using the export, delete, import image approach worked. I had not noticed before but container image tags are lost in that process. They are regained from my registry when I restart the container. I do not have a way to go back on this system.

@dustymabe
Copy link
Member

I'm still not sure if there is something actionable in this ticket. Are the issues with podman resolved? If not can we open new bugs against https://github.com/containers/podman ?

@PhrozenByte
Copy link

Not sure either... @giuseppe provided a workaround that works fine in production, but the issue persists. I've just opened a reference issue against containers/podman, see containers/podman#18435

@PhrozenByte
Copy link

Some feedback from containers/podman#18435: This issue was fixed with Podman 4.5.0, which just landed in stable FCOS. I guess we can close this now. Thanks everyone! 👍

@dustymabe
Copy link
Member

Thanks for the feedback @PhrozenByte!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants