Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Ryuk does not start if run through wormhole ARM container on a Linux AMD64 host using QEMU on GHA #8138

Open
ppalaga opened this issue Jan 23, 2024 · 9 comments
Labels

Comments

@ppalaga
Copy link

ppalaga commented Jan 23, 2024

Module

Core

Testcontainers version

1.19.3

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host Arch

x86_64

Docker version

Client: Docker Engine - Community
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.20.10
 Git commit:        afdd53b
 Built:             Thu Oct 26 09:07:41 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.10
  Git commit:       311b9ff
  Built:            Thu Oct 26 09:07:41 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.27
  GitCommit:        a1496014c916f9e62104b33d1bb5bd03b0858e59
 runc:
  Version:          1.1.11
  GitCommit:        v1.1.11-0-g4bccb38
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

What happened?

My aim is to run my integration tests on Arm64.
I do that by running Maven in an Arm64 container:

docker run --rm \
          -v ~/.m2/repository:/root/.m2/repository \
          -v /var/run/docker.sock:/var/run/docker.sock \
          -v $PWD:$PWD \
          -w $PWD \
          arm64v8/amazoncorretto:17-al2023 \
          ./mvnw clean test -ntp

Here is a minimal reproducer project: https://github.com/ppalaga/testcontainers-arm64

While this works as expected on my Fedora 38 with the following docker version

$ docker version 
Client: Docker Engine - Community
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:33:40 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:32:05 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.24
  GitCommit:        61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
 runc:
  Version:          1.1.9
  GitCommit:        v1.1.9-0-gccaecfc
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

it does not work on Github Actions latest Ubuntu with docker version 24.0.7.

Relevant log output

2024-01-23T15:20:42.3722091Z [INFO]  T E S T S
2024-01-23T15:20:42.3730664Z [INFO] -------------------------------------------------------
2024-01-23T15:20:51.1628203Z [INFO] Running org.acme.rest.client.HelloTest
2024-01-23T15:21:54.0308946Z 2024-01-23 15:21:53,923 INFO  [org.tes.ima.PullPolicy] (pool-3-thread-1) Image pull policy will be performed by: DefaultPullPolicy()
2024-01-23T15:21:54.0325038Z 2024-01-23 15:21:53,968 INFO  [org.tes.uti.ImageNameSubstitutor] (pool-3-thread-1) Image name substitution will be performed by: DefaultImageNameSubstitutor (composite of 'ConfigurationFileImageNameSubstitutor' and 'PrefixingImageNameSubstitutor')
2024-01-23T15:21:59.3689018Z 2024-01-23 15:21:59,301 INFO  [org.tes.doc.DockerClientProviderStrategy] (pool-3-thread-1) Found Docker environment with local Unix socket (unix:///var/run/docker.sock)
2024-01-23T15:21:59.6721574Z 2024-01-23 15:21:59,584 INFO  [org.tes.DockerClientFactory] (pool-3-thread-1) Docker host IP address is 172.17.0.1
2024-01-23T15:21:59.7682860Z 2024-01-23 15:21:59,709 INFO  [org.tes.DockerClientFactory] (pool-3-thread-1) Connected to docker: 
2024-01-23T15:21:59.7684166Z   Server Version: 24.0.7
2024-01-23T15:21:59.7684912Z   API Version: 1.43
2024-01-23T15:21:59.7685524Z   Operating System: Ubuntu 22.04.3 LTS
2024-01-23T15:21:59.7766183Z   Total Memory: 15981 MB
2024-01-23T15:21:59.9694346Z 2024-01-23 15:21:59,932 INFO  [tc.tes.5.1] (pool-3-thread-1) Creating container for image: testcontainers/ryuk:0.5.1
2024-01-23T15:22:00.0707890Z 2024-01-23 15:21:59,990 INFO  [org.tes.uti.RegistryAuthLocator] (pool-3-thread-1) Failure when attempting to lookup auth config. Please ignore if you don't have images in an authenticated registry. Details: (dockerImageName: testcontainers/ryuk:0.5.1, configFile: /root/.docker/config.json, configEnv: DOCKER_AUTH_CONFIG). Falling back to docker-java default behaviour. Exception message: Status 404: No config supplied. Checked in order: /root/.docker/config.json (file not found), DOCKER_AUTH_CONFIG (not set)
2024-01-23T15:22:00.9791574Z 2024-01-23 15:22:00,872 INFO  [tc.tes.5.1] (pool-3-thread-1) Container testcontainers/ryuk:0.5.1 is starting: 2393273dd6941bc91184564f95553535e8847480a5dd55de674a043ad908c538
2024-01-23T15:23:01.2065696Z 2024-01-23 15:23:01,179 INFO  [tc.tes.5.1] (pool-3-thread-1) Container testcontainers/ryuk:0.5.1 started in PT1M1.246212324S
2024-01-23T15:23:01.3078631Z 2024-01-23 15:23:01,219 WARN  [org.tes.uti.RyukResourceReaper] (testcontainers-ryuk) Can not connect to Ryuk at 172.17.0.1:32772: java.net.ConnectException: Connection refused
2024-01-23T15:23:01.3080587Z 	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
2024-01-23T15:23:01.3082068Z 	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
2024-01-23T15:23:01.3083349Z 	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:547)
2024-01-23T15:23:01.3084725Z 	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:602)
2024-01-23T15:23:01.3086062Z 	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
2024-01-23T15:23:01.3087419Z 	at java.base/java.net.Socket.connect(Socket.java:633)
2024-01-23T15:23:01.3088701Z 	at org.testcontainers.utility.RyukResourceReaper.lambda$null$1(RyukResourceReaper.java:105)
2024-01-23T15:23:01.3090426Z 	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
2024-01-23T15:23:01.3091980Z 	at org.testcontainers.utility.RyukResourceReaper.lambda$maybeStart$2(RyukResourceReaper.java:101)
2024-01-23T15:23:01.3093675Z 	at java.base/java.lang.Thread.run(Thread.java:840)
2024-01-23T15:23:01.3094400Z 
...
2024-01-23T15:23:31.2896544Z 
2024-01-23T15:23:31.2897646Z 2024-01-23 15:23:31,277 ERROR [org.tes.uti.RyukResourceReaper] (pool-3-thread-1) Timed out waiting for Ryuk container to start. Ryuk's logs:
2024-01-23T15:23:31.2898738Z 
2024-01-23T15:23:31.4877774Z 2024-01-23 15:23:31,449 WARN  [org.tes.uti.RyukResourceReaper] (testcontainers-ryuk) Can not connect to Ryuk at 172.17.0.1:32772: java.net.ConnectException: Connection refused
2024-01-23T15:23:31.4879562Z 	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
2024-01-23T15:23:31.4880777Z 	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
2024-01-23T15:23:31.4881904Z 	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:547)
2024-01-23T15:23:31.4883102Z 	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:602)
2024-01-23T15:23:31.4884277Z 	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
2024-01-23T15:23:31.4885269Z 	at java.base/java.net.Socket.connect(Socket.java:633)
2024-01-23T15:23:31.4886435Z 	at org.testcontainers.utility.RyukResourceReaper.lambda$null$1(RyukResourceReaper.java:105)
2024-01-23T15:23:31.4888100Z 	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
2024-01-23T15:23:31.4889580Z 	at org.testcontainers.utility.RyukResourceReaper.lambda$maybeStart$2(RyukResourceReaper.java:101)
2024-01-23T15:23:31.4890770Z 	at java.base/java.lang.Thread.run(Thread.java:840)
2024-01-23T15:23:31.4891263Z 
2024-01-23T15:23:32.1129349Z [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 160.5 s <<< FAILURE! -- in org.acme.rest.client.HelloTest
2024-01-23T15:23:32.1140021Z [ERROR] org.acme.rest.client.HelloTest.hello -- Time elapsed: 0.070 s <<< ERROR!
2024-01-23T15:23:32.1141858Z java.lang.RuntimeException: java.lang.RuntimeException: Unable to start Quarkus test resource class org.acme.rest.client.HelloTestResource
2024-01-23T15:23:32.1143873Z 	at io.quarkus.test.junit.QuarkusTestExtension.throwBootFailureException(QuarkusTestExtension.java:638)
2024-01-23T15:23:32.1145799Z 	at io.quarkus.test.junit.QuarkusTestExtension.interceptTestClassConstructor(QuarkusTestExtension.java:722)
2024-01-23T15:23:32.1147422Z 	at java.base/java.util.Optional.orElseGet(Optional.java:364)
2024-01-23T15:23:32.1148562Z 	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
2024-01-23T15:23:32.1149752Z 	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
2024-01-23T15:23:32.1151306Z Caused by: java.lang.RuntimeException: Unable to start Quarkus test resource class org.acme.rest.client.HelloTestResource
2024-01-23T15:23:32.1153268Z 	at io.quarkus.test.common.TestResourceManager$TestResourceEntryRunnable.run(TestResourceManager.java:506)
2024-01-23T15:23:32.1155178Z 	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
2024-01-23T15:23:32.1156886Z 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2024-01-23T15:23:32.1158617Z 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2024-01-23T15:23:32.1159973Z 	at java.base/java.lang.Thread.run(Thread.java:840)
2024-01-23T15:23:32.1161240Z Caused by: java.lang.IllegalStateException: Could not connect to Ryuk at 172.17.0.1:32772
2024-01-23T15:23:32.1162654Z 	at org.testcontainers.utility.RyukResourceReaper.maybeStart(RyukResourceReaper.java:145)
2024-01-23T15:23:32.1164046Z 	at org.testcontainers.utility.RyukResourceReaper.init(RyukResourceReaper.java:42)
2024-01-23T15:23:32.1165368Z 	at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:232)
2024-01-23T15:23:32.1167003Z 	at org.testcontainers.DockerClientFactory$1.getDockerClient(DockerClientFactory.java:106)
2024-01-23T15:23:32.1168505Z 	at com.github.dockerjava.api.DockerClientDelegate.authConfig(DockerClientDelegate.java:109)
2024-01-23T15:23:32.1170737Z 	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:332)
2024-01-23T15:23:32.1171897Z 	at org.acme.rest.client.HelloTestResource.start(HelloTestResource.java:21)
2024-01-23T15:23:32.1173218Z 	at io.quarkus.test.common.TestResourceManager$TestResourceEntryRunnable.run(TestResourceManager.java:500)
2024-01-23T15:23:32.1174296Z 	... 4 more
2024-01-23T15:23:32.1174511Z 
2024-01-23T15:23:32.1213769Z 2024-01-23 15:23:31,706 WARN  [org.tes.uti.RyukResourceReaper] (testcontainers-ryuk) Can not connect to Ryuk at 172.17.0.1:32772: java.net.ConnectException: Connection refused
2024-01-23T15:23:32.1215374Z 	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
2024-01-23T15:23:32.1216598Z 	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
2024-01-23T15:23:32.1217585Z 	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:547)
2024-01-23T15:23:32.1218651Z 	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:602)
2024-01-23T15:23:32.1219592Z 	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
2024-01-23T15:23:32.1220435Z 	at java.base/java.net.Socket.connect(Socket.java:633)
2024-01-23T15:23:32.1221488Z 	at org.testcontainers.utility.RyukResourceReaper.lambda$null$1(RyukResourceReaper.java:105)
2024-01-23T15:23:32.1222684Z 	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
2024-01-23T15:23:32.1224062Z 	at org.testcontainers.utility.RyukResourceReaper.lambda$maybeStart$2(RyukResourceReaper.java:101)
2024-01-23T15:23:32.1225212Z 	at java.base/java.lang.Thread.run(Thread.java:840)
2024-01-23T15:23:32.1225697Z 
2024-01-23T15:23:32.1227090Z 2024-01-23 15:23:31,962 WARN  [org.tes.uti.RyukResourceReaper] (testcontainers-ryuk) Can not connect to Ryuk at 172.17.0.1:32772: java.net.ConnectException: Connection refused
2024-01-23T15:23:32.1228696Z 	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
2024-01-23T15:23:32.1229505Z 	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
2024-01-23T15:23:32.1230531Z 	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:547)
2024-01-23T15:23:32.1231700Z 	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:602)
2024-01-23T15:23:32.1232770Z 	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
2024-01-23T15:23:32.1233439Z 	at java.base/java.net.Socket.connect(Socket.java:633)
2024-01-23T15:23:32.1234115Z 	at org.testcontainers.utility.RyukResourceReaper.lambda$null$1(RyukResourceReaper.java:105)
2024-01-23T15:23:32.1234928Z 	at org.rnorth.ducttape.ratelimits.RateLimiter.doWhenReady(RateLimiter.java:27)
2024-01-23T15:23:32.1235759Z 	at org.testcontainers.utility.RyukResourceReaper.lambda$maybeStart$2(RyukResourceReaper.java:101)
2024-01-23T15:23:32.1236442Z 	at java.base/java.lang.Thread.run(Thread.java:840)
2024-01-23T15:23:32.1236730Z 
2024-01-23T15:23:32.2001449Z [INFO] 
2024-01-23T15:23:32.2005951Z [INFO] Results:
2024-01-23T15:23:32.2006803Z [INFO] 
2024-01-23T15:23:32.2012109Z [ERROR] Errors: 
2024-01-23T15:23:32.2019859Z [ERROR]   HelloTest.hello » Runtime java.lang.RuntimeException: Unable to start Quarkus test resource class org.acme.rest.client.HelloTestResource

Additional Information

Observation: when I take the x86_64 image by removing arm64v8, it works as expected also on GitHub actions

docker run --rm \
          -v ~/.m2/repository:/root/.m2/repository \
          -v /var/run/docker.sock:/var/run/docker.sock \
          -v $PWD:$PWD \
          -w $PWD \
          amazoncorretto:17-al2023 \
          ./mvnw clean test -ntp

I tried to experiment with adding various iptables rules, but they did not change anything.
It is perhaps not a firewall issue given that testing from within x86_64 works.

I'd be thankful for any hints.

@ppalaga ppalaga changed the title [Bug]: Docker wormhole does not work when Maven is run in an Arm container [Bug]: Docker wormhole does not work when Maven is run in an Arm container on a Linux Amd64 host on GitHubActions Jan 25, 2024
@kiview
Copy link
Member

kiview commented Feb 1, 2024

Hey @ppalaga, thanks for reporting this issue and the reproducer.
From a first look, I also don't think it is networking related, but the Ryuk container might not start correctly in the case of using the ARM image.

I wonder if we can debug it more interactively by using a GH Codespace, since it tends to be an environment quite similar to the GHA runners.

@ppalaga
Copy link
Author

ppalaga commented Feb 1, 2024

Thanks for the feedback, @kiview ! It sounds like I cannot help much with the Ryuk investigation. Please let me know if there is anything else I can try.

@ppalaga
Copy link
Author

ppalaga commented Feb 1, 2024

Actually, I could try skipping Ryuk 💡

@ppalaga
Copy link
Author

ppalaga commented Feb 2, 2024

Indeed this change

ppalaga/testcontainers-arm64@b8f2f75

makes it pass - see https://github.com/ppalaga/testcontainers-arm64/actions/runs/7756244049/job/21153170695#step:9:1

Thanks for the hint @kiview!

Maybe we should rename this issue to something like "Ryuk does not start properly..."

ppalaga added a commit to ppalaga/testcontainers-arm64 that referenced this issue Feb 2, 2024
@kiview kiview changed the title [Bug]: Docker wormhole does not work when Maven is run in an Arm container on a Linux Amd64 host on GitHubActions [Bug]: Ryuk does not start if run through wormhole ARM container on a Linux AMD64 host using QEMU on GHA Feb 6, 2024
@kiview
Copy link
Member

kiview commented Feb 6, 2024

Since we need QEMU to be setup, it is not that trivial to just run it in a Codespace (since we need to install QEMU in there and we can't just just the GHA setup action), so I'll first try to get some more insights by running with DEBUG log.

This could also easily be an issue with Ryuk running through QEMU emulation in ARM. And thinking about it, it should be a good experiment to run this setup with ARM (but without QEMU) locally on an ARM MacBook with Docker Desktop (which I will also do now).

Edit:
I just tested it using Docker Desktop on ARM MacBook and the example runs fine using Ryuk:

docker run --rm \
          -v ~/.m2/repository:/root/.m2/repository \
          -v /var/run/docker.sock.raw:/var/run/docker.sock \
          -v $PWD:$PWD \
          -w $PWD \
          arm64v8/amazoncorretto:17-al2023 \
          ./mvnw clean test -ntp

Note that I have to use /var/run/docker.sock.raw, but that is something specific to Docker Desktop on Mac (because of the Moby VM), this is not necessary on a Linux host that has Docker directly installed.

Given this finding, I am pretty sure this is in someway QEMU related. There are ARM based runners going to be released, I think that should be the way to go.

I am inclined to close this issue since I don't really see us investing time to debug issues of running Ryuk through QEMU.

Maybe on last question I did not get from the initial post @ppalaga:
Was the Fedora machine you used for testing an ARM machine, or were you also using QEMU there? If the latter is the case, we can further pin the issue to the combination of QEMU (for ARM emulation) + Ubuntu 22.04 + Ryuk.

@ppalaga
Copy link
Author

ppalaga commented Feb 6, 2024

I just tested it using Docker Desktop on ARM MacBook and the example runs fine using Ryuk:

Thanks for the investigation! Newly, there are M1 runners for GH Actions. Do you happen to know whether this setup with docker desktop (or similar) will work there too? That would surely be much faster than the QEMU emulation on Linux.

There are ARM based runners going to be released, I think that should be the way to go.

Yep, hopefully, they will be available for opensource projects soon.

Was the Fedora machine you used for testing an ARM machine, or were you also using QEMU there? If the latter is the case, we can further pin the issue to the combination of QEMU (for ARM emulation) + Ubuntu 22.04 + Ryuk.

It was Fedora on x86_64 with Qemu.

@ppalaga
Copy link
Author

ppalaga commented Feb 6, 2024

I just tested it using Docker Desktop on ARM MacBook and the example runs fine using Ryuk:

Thanks for the investigation! Newly, there are M1 runners for GH Actions. Do you happen to know whether this setup with docker desktop (or similar) will work there too? That would surely be much faster than the QEMU emulation on Linux.

The answer to my own question seems to be no: https://github.com/marketplace/actions/setup-docker-on-macos#arm64-processors-m1-m2-m3-series-used-on-macos-14-images-are-unsupported

@kiview
Copy link
Member

kiview commented Feb 7, 2024

Yes, it is always tricky to get Docker access on provided CI runners, in general it is only provided for Linux runners. Is it important for your use case, that the containers started by TC are ARM containers, or are you mainly interested in testing an ARM build of your app?

@ppalaga
Copy link
Author

ppalaga commented Feb 7, 2024

Yes, I need to test my Java framework code on JVM built for Linux and ARM. That's How our end users and customers will run it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants