Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

privileged: true is problematic w/Debian 10, official Docker #532

Closed
nurturenature opened this issue Mar 7, 2022 · 10 comments
Closed

privileged: true is problematic w/Debian 10, official Docker #532

nurturenature opened this issue Mar 7, 2022 · 10 comments

Comments

@nurturenature
Copy link
Contributor

Current master fails with Debian 10, Ubuntu 20.04, when privileged: true.
Returning to mounting cgroup works better but still has errors.

  volumes:
    - "/sys/fs/cgroup:/sys/fs/cgroup:ro"
  # privileged: true

Fresh install of Debian 10 and official Docker, bin/up:

  • dumped into tty login for a random node
  • display unresponsive,squirrel'y
  • power reset needed

On restarting host then only starting control:

Control log:

sort: cannot read: /var/jepsen/shared/nodes: No such file or directory
Welcome to Jepsen on Docker
===========================

Attaching a control node shell:

root@control:/# ls -al /var/jepsen/shared/nodes
-rw-r--r-- 1 root root 15 Mar  7 20:59 /var/jepsen/shared/nodes

root@control:/# cat /var/jepsen/shared/nodes
n1
...
root@control:/# ssh n1
The authenticity of host 'n1 (172.18.0.2)' can't be established.
ECDSA key fingerprint is SHA256:lZ0maK1f5FXBh3wsWpuwUuwNCgGWFJhaTz5fObOulJw.
Are you sure you want to continue connecting (yes/no)? 

ls -al /root
-rw-r--r-- 1 root root    0 Mar  7 20:59 nodes
drwx------ 2 root root 4096 Mar  7 20:59 .ssh

root@control:/# rm ~/.ssh/known_hosts 
root@control:/# ./init.sh 
mkdir: cannot create directory '/root/.ssh': File exists
# n1:22 SSH-2.0-OpenSSH_7.9p1 Debian-10+deb10u2
...
^C

root@control:/# ssh n1
Warning: Permanently added the ED25519 host key for IP address '172.18.0.3' to the list of known hosts.

DB node logs Ok.

Edit docker-compose.yml template:

  volumes:
    - "/sys/fs/cgroup:/sys/fs/cgroup:ro"
  # privileged: true

bin/up does bring up environment with errors in log:

[FAILED] Failed to mount FUSE Control File System.
jepsen-n3  | See 'systemctl status sys-fs-fuse-connections.mount' for details.

along with huge tables and several others.

Attaching to node shows:

root@n3:/# systemctl status sys-fs-fuse-connections.mount
● sys-fs-fuse-connections.mount - FUSE Control File System
   Loaded: loaded (/lib/systemd/system/sys-fs-fuse-connections.mount; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2022-03-07 21:17:58 UTC; 3min 5s ago
    Where: /sys/fs/fuse/connections
     What: fusectl
     Docs: https://www.kernel.org/doc/Documentation/filesystems/fuse.txt
           https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems

Mar 07 21:17:58 n3 systemd[1]: sys-fs-fuse-connections.mount: Mount process exited, code=exited, status=32/n/a
Mar 07 21:17:58 n3 mount[264]: mount: /sys/fs/fuse/connections: cannot mount fusectl read-only.
Mar 07 21:17:58 n3 systemd[1]: sys-fs-fuse-connections.mount: Failed with result 'exit-code'.
Mar 07 21:17:58 n3 systemd[1]: Failed to mount FUSE Control File System.
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

The above is also true for Ubuntu 20.04.
In addition, tried editing boot parms on Ubuntu to explicitly insure hierarchical cgroup, force v2, etc.
Always similar errors.

@aphyr
Copy link
Collaborator

aphyr commented Mar 8, 2022

uguhghghghhghghghhhh why is Docker LIKE this? I thought the whole point was to be reproducible, but for the last four years I've been trapped in an endless hellscape of PRs which fix Docker on one person's system and break it on another's. 😭

@chhetripradeep, @dancmeyers, @m1l4n54v1c, since y'all have contributed to the Docker scripts on various platforms recently, any chance you can help sort this out?

@nurturenature
Copy link
Contributor Author

why is Docker LIKE this?

Because the hellscape frequently does a container escape?
Because container dependencies/configs are the anti-CRDT? (occasionally consistent, often conflicted)

At the current time, it just might not be possible for Docker to support the needed Jepsen privileges in many common environments. I think it's fair to say to fully use Jepsen, this is the host OS, etc. needed to support it's capabilities.

I switched to stock Debian 10 and official Docker hoping it was the core development/supported environment for Jepsen.

What is the expected environment to develop with master?

@aphyr
Copy link
Collaborator

aphyr commented Mar 8, 2022

I use Debian 10/11 with LXC, and the AWS Marketplace build for testing on physical machines. The docker build's not something I use very often (in large part because it seems like it's always breaking in new ways, haha).

@dancmeyers
Copy link
Contributor

What do you mean by ‘official’ docker? Official from Debian’s repo for 10 (which is dog-old and the Docker website itself recommends against, IIRC), or the official Docker repo that you can add as a separate apt source?

I do the latter, and added the separate privileged flag because it was needed to get cgroups v2 working, but I’m mainly on MacOS, for which Docker Desktop creates a hidden Linux VM of… some flavour (Debian? Ubuntu? Arch?) and then proxies all commands into that, so there could well be some unexpected funkiness there. The docs I was following when I did it were for native Linux using systemd and cgroups v2 though.

@aphyr
Copy link
Collaborator

aphyr commented Mar 8, 2022

Ah, that's a good question--if we depend on a certain minimum version of Docker, maybe we could check that version in the script and let people know if they're running an older version?

@nurturenature
Copy link
Contributor Author

What do you mean by ‘official’ docker?
the official Docker repo that you can add as a separate apt source?

Yes, added Docker repo as apt source from Docker.
Used Native Linux for Debian docs on docker.com.

maybe we could check that version

Docker Engine >= 20.10

per container forums.

Forums also say Debian 11, Ubuntu 21.10, Docker 20.10, are better aligned re cgroup v2.
Good to hear MacOS currently supports.

Rereading the Jepsen docs, they do favor Debian LXC and AWS images.
Docker is just so seductive that we expect it to work.

I don't know how to resolve the cross .yml commits cycle, LWW? 😄

@nurturenature
Copy link
Contributor Author

Installed LXC and it's a wonderful environment. Very productive.
Finding Jepsen development to be more REPL'y when you're the control node.

I'll keep periodically trying Docker.

Thanks!

@nurturenature
Copy link
Contributor Author

Tried again with latest Debian + Docker:

Debian 11.3
Docker 20.10.15

and it failed in a similar fashion as with earlier versions.

@aphyr
Copy link
Collaborator

aphyr commented May 12, 2022

If you want to fix this, I'd be delighted.

@nurturenature
Copy link
Contributor Author

Current host/container/systemd/Docker configs/behavior have evolved to the point where systemd containers, e.g.

  • FROM jgoerzen/debian-base-standard (and many others)

can only be configured with docker run and not with docker compose.

Configuring systemd containers is documented by debain-base-standard Container Invocation, Docker release notes, and misc forum posts:

  • cgroupns=host
  • -v /sys/fs/cgroup:/sys/fs/cgroup:rw

(In addition Debian doesn't need --priviledged and has better shutdown with --stop-signal=SIGRTMIN+3.)

Current docker compose does not support cgroupns=host so cgroupv2 mounts fail. Issue, issue, issue, and systemd telling docker to fix itself.

The current workaround is to use docker run vs docker compose.
This would mean converting Jepsen's compose orchestration to a script of individual docker commands to create/run/manage containers, networks, volumes, etc.
I tried a simple script to bring up several db nodes, control node, etc with minimal success. Not very reproducible, timing issues, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants