Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use health status when looking up reaper (ryuk) container #2508

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

emetsger
Copy link

What does this PR do?

Updates lookUpReaperContainer to consider the health of the ryuk container.

  • If the ryuk container advertises a health status, lookUpReaperContainer will require a healthy container before returning.
  • If the ryuk container does not advertise a status, the lookUpReaperContainer will return the container.

Why is it important?

The ryuk container can be exposed by the Docker API before it is ready to accept connections, resulting in messages like:

2024/04/21 22:25:10 🔥 Reaper obtained from Docker for this test session 99998b629eea57023f49aefc9eda5406c84ce4c91d0c6005c1b8c00dcbbce0d7
2024/04/21 22:25:10 failed to start container: dial tcp 127.0.0.1:32917: connect: connection refused: Connecting to Ryuk on localhost:32917 failed: connecting to reaper failed: failed to create container

Tests will fail when they would otherwise pass.

This is especially the case in Go, where test binaries are run in parallel, one for each package. If multiple packages use testcontainers, they can obtain a reaper instance from the Docker API before the container is ready to serve connections, leading to dialing errors like connect: connection refused.

The current workaround is to run test binaries serially, e.g., go test -p 1 ./..., which is not optimal, even for modest codebases.

Related issues

See complementary PR, which adds a HEALTHCHECK to the ryuk container: testcontainers/moby-ryuk#128

How to test this PR

To recreate the initial problem and run the workaround

Clone this test case repo and run go test -count=1 -v ./... and observe the errors.

Run go test -count=1 -p 1 -v ./... and tests should pass.

To test the fix

Clone moby-ryuk, apply pull/128, then build a new ryuk container, and tag it:

  1. docker build -f linux/Dockerfile -t local/ryuk .
  2. docker tag local/ryuk testcontainers/ryuk:0.7.0

Check out this branch/PR locally.

Clone the test case repo and update go.mod to point to this branch via a replace, e.g.:

  • replace github.com/testcontainers/testcontainers-go => /Users/esm/workspaces/testcontainers-go

Running go test -count=1 -v ./... should pass.

@emetsger emetsger requested a review from a team as a code owner April 22, 2024 02:44
Copy link

netlify bot commented Apr 22, 2024

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit 1b85b7c
🔍 Latest deploy log https://app.netlify.com/sites/testcontainers-go/deploys/6639781d0e377b000867d647
😎 Deploy Preview https://deploy-preview-2508--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@emetsger
Copy link
Author

Hey team, just bumping this PR. Would love to hear feedback and if there's anything I can do to help move it forward 🙏

Thanks for all your efforts on Testcontainers! Great project.

@mdelapenya
Copy link
Collaborator

Clone this test case repo and run go test -count=1 -v ./... and observe the errors.

Hi @emetsger I cannot reproduce this yet:

go test -count=1  -v ./...  
=== RUN   TestA
2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker: 
  Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1)
  API Version: 1.43
  Operating System: Ubuntu 22.04.4 LTS
  Total Memory: 15779 MB
  Resolved Docker Host: tcp://127.0.0.1:54509
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084
  Test ProcessID: b81dbac4-46a1-4e91-a577-a7e47d57518e
2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f
2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine
2024/04/26 17:27:38 ✅ Container created: 0b7ae9cbdeb1
2024/04/26 17:27:38 🐳 Starting container: 0b7ae9cbdeb1
2024/04/26 17:27:39 ✅ Container started: 0b7ae9cbdeb1
2024/04/26 17:27:39 🚧 Waiting for container id 0b7ae9cbdeb1 image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x140003c6e88 Strategies:[0x140003e2c30]}
2024/04/26 17:27:41 🔔 Container is ready: 0b7ae9cbdeb1
2024/04/26 17:27:41 🐳 Terminating container: 0b7ae9cbdeb1
2024/04/26 17:27:41 🚫 Container terminated: 0b7ae9cbdeb1
--- PASS: TestA (3.65s)
PASS
ok  	tc-debug/a	4.012s
=== RUN   TestB
2024/04/26 17:27:37 github.com/testcontainers/testcontainers-go - Connected to docker: 
  Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1)
  API Version: 1.43
  Operating System: Ubuntu 22.04.4 LTS
  Total Memory: 15779 MB
  Resolved Docker Host: tcp://127.0.0.1:54509
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084
  Test ProcessID: 2a7fd6eb-d055-4e31-aaa6-00cf7cd47542
2024/04/26 17:27:37 🐳 Creating container for image testcontainers/ryuk:0.7.0
2024/04/26 17:27:38 ✅ Container created: 494993ccefba
2024/04/26 17:27:38 🐳 Starting container: 494993ccefba
2024/04/26 17:27:38 ✅ Container started: 494993ccefba
2024/04/26 17:27:38 🚧 Waiting for container id 494993ccefba image: testcontainers/ryuk:0.7.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/04/26 17:27:38 🔔 Container is ready: 494993ccefba
2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine
2024/04/26 17:27:38 ✅ Container created: 8386f359812f
2024/04/26 17:27:38 🐳 Starting container: 8386f359812f
2024/04/26 17:27:39 ✅ Container started: 8386f359812f
2024/04/26 17:27:39 🚧 Waiting for container id 8386f359812f image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x1400037ef28 Strategies:[0x1400039ebd0]}
2024/04/26 17:27:41 🔔 Container is ready: 8386f359812f
2024/04/26 17:27:41 🐳 Terminating container: 8386f359812f
2024/04/26 17:27:41 🚫 Container terminated: 8386f359812f
--- PASS: TestB (3.92s)
PASS
ok  	tc-debug/b	4.104s
=== RUN   TestC
2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker: 
  Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1)
  API Version: 1.43
  Operating System: Ubuntu 22.04.4 LTS
  Total Memory: 15779 MB
  Resolved Docker Host: tcp://127.0.0.1:54509
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084
  Test ProcessID: 0b1569fa-f57c-46af-afe3-b26dc818fbc8
2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f
2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine
2024/04/26 17:27:38 ✅ Container created: b2b8530b534d
2024/04/26 17:27:38 🐳 Starting container: b2b8530b534d
2024/04/26 17:27:38 ✅ Container started: b2b8530b534d
2024/04/26 17:27:38 🚧 Waiting for container id b2b8530b534d image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x1400037ef28 Strategies:[0x1400039ebd0]}
2024/04/26 17:27:40 🔔 Container is ready: b2b8530b534d
2024/04/26 17:27:40 🐳 Terminating container: b2b8530b534d
2024/04/26 17:27:41 🚫 Container terminated: b2b8530b534d
--- PASS: TestC (3.28s)
PASS
ok  	tc-debug/c	3.756s
=== RUN   TestD
2024/04/26 17:27:38 github.com/testcontainers/testcontainers-go - Connected to docker: 
  Server Version: 78+testcontainerscloud (via Testcontainers Desktop 1.10.1)
  API Version: 1.43
  Operating System: Ubuntu 22.04.4 LTS
  Total Memory: 15779 MB
  Resolved Docker Host: tcp://127.0.0.1:54509
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: a077cc8ef4acf0841cd4b227aeac15cf8611190b47530d735cb90d3993d5d084
  Test ProcessID: bb20c031-95bb-4791-af3b-e4b1ecf5d818
2024/04/26 17:27:38 🔥 Reaper obtained from Docker for this test session 494993ccefba558a0d64b9ddce7d8db30444e35a49bf12d9cbd8ef205c23e05f
2024/04/26 17:27:38 🐳 Creating container for image docker.io/postgres:16-alpine
2024/04/26 17:27:38 ✅ Container created: 2f51f8a61406
2024/04/26 17:27:38 🐳 Starting container: 2f51f8a61406
2024/04/26 17:27:39 ✅ Container started: 2f51f8a61406
2024/04/26 17:27:39 🚧 Waiting for container id 2f51f8a61406 image: docker.io/postgres:16-alpine. Waiting for: &{timeout:<nil> deadline:0x140003f8e58 Strategies:[0x14000458bd0]}
2024/04/26 17:27:41 🔔 Container is ready: 2f51f8a61406
2024/04/26 17:27:41 🐳 Terminating container: 2f51f8a61406
2024/04/26 17:27:41 🚫 Container terminated: 2f51f8a61406
--- PASS: TestD (3.32s)
PASS
ok  	tc-debug/d	3.933s

I've run run it even with count=10, with no errors. Is there anything you think I can do to reproduce it?

@emetsger
Copy link
Author

emetsger commented Apr 27, 2024 via email

@mdelapenya
Copy link
Collaborator

What container engine/version are you on?

I'm using both Docker for Mac and Testcontainers Cloud.

@simonbos
Copy link

simonbos commented May 2, 2024

I am experiencing a similar issue in (Jenkins) CI, although it's flaky - only on some runs there is a connection refused error as follows.

[2024-05-02T09:39:21.954Z] 2024/05/02 09:39:12 github.com/testcontainers/testcontainers-go - Connected to docker: 
[2024-05-02T09:39:21.954Z]   Server Version: 20.10.23
[2024-05-02T09:39:21.954Z]   API Version: 1.41
[2024-05-02T09:39:21.954Z]   Operating System: Rocky Linux 9.2 (Blue Onyx)
[2024-05-02T09:39:21.954Z]   Total Memory: 15982 MB
[2024-05-02T09:39:21.954Z]   Resolved Docker Host: unix:///var/run/docker.sock
[2024-05-02T09:39:21.954Z]   Resolved Docker Socket Path: /var/run/docker.sock
[2024-05-02T09:39:21.954Z]   Test SessionID: 9644673c1b6ea45d4962460bb5e7a2a9ba32c1f293bf4166d0e43de70c021a86
[2024-05-02T09:39:21.954Z]   Test ProcessID: 05a16b84-4e96-4b61-ac3d-29f63de4d258
[2024-05-02T09:39:21.954Z] 2024/05/02 09:39:12 🐳 Creating container for image testcontainers/ryuk:0.7.0
[2024-05-02T09:39:21.954Z] 2024/05/02 09:39:13 🔥 Reaper obtained from Docker for this test session 1179229b008b48672081fd3372999389793505d2b013d054c454a6838a6b5544
[2024-05-02T09:39:21.954Z]     <redacted>: dial tcp <redacted>:35412: connect: connection refused: Connecting to Ryuk on <redacted>:35412 failed: connecting to reaper failed: failed to create container

Note that I am unable to reproduce this on my machine (with different container engine & version).

@emetsger
Copy link
Author

emetsger commented May 2, 2024 via email

@simonbos
Copy link

simonbos commented May 3, 2024

Do you know what docker engine/version is running in your CI?

It's running "Docker Engine - Community" version 20.10.23. There is some extra info in the logs above.

@emetsger
Copy link
Author

emetsger commented May 5, 2024

@simonbos what docker engine/version are you running on your local machine?

@emetsger
Copy link
Author

emetsger commented May 5, 2024

Test Case Test Result Testcontainers Version Container Manager Container Engine Version Container Engine OS API Version containerd version runc version docker-init version
1 🚫 Fail main (539284ce) Rancher 1.13.1 dockerd/moby 24.0.7 per shell dockerd -v Alpine Linux v3.19.1 per shell, /etc/alpine-release 1.43, per shell docker version v1.7.10 per shell, containerd -v 1.1.10 per shell, docker version 0.19.0 per shell, docker version
2 ✅ Succeeds main (539284ce) Docker for Mac v4.29.0 Docker Engine/moby 26.0.0 Docker Desktop (per test container output) 1.45 (per release notes), 1.44 per testcontainers output. v1.7.13 (per release notes), 1.6.28 per shell 1.1.12 (per release notes)
3 ✅ Succeeds main (539284ce) 78+testcontainerscloud (via Testcontainers Desktop 1.11.0) (per testcontainers output) 1.43 (per testcontainers output)
4 Sporadically fails (reported, but could not reproduce; see test case #5) Docker Engine Community 20.10.23 (release notes) dockerd/moby 20.10.23 (per testcontainers output) 1.41 (per Engine/API version matrix) and testcontainers output v1.6.15 (per release notes)
5 ✅ Succeeds main (539284ce) Docker for Mac v4.17.0 dockerd/moby 20.10.23 per shell, dockerd -v Docker Desktop (per test container output) 1.41 (per Engine/API version matrix) and testcontainers output v1.6.18 per shell, containerd -v 1.1.4 per shell, runc -v

(link to gist)

@emetsger
Copy link
Author

emetsger commented May 5, 2024

@mdelapenya I ran the test case against a variety of container managers/docker engines (see table above)

I was able to successfully run my test case against all the container managers except for Rancher.

  • I attempted to reproduce the failure reported by @simonbos by using the same docker engine version used in their CI, but after running the test case multiple times, I never got it to fail.

@mdelapenya can you install Rancher 1.13.1 (latest at the time of this writing) and attempt to reproduce on your side?

@mdelapenya
Copy link
Collaborator

Wow! @emetsger thank you so much for dedicating that amount of time to reproduce the potential bug in so many container environments. Very proud of you 🙇

I'll install rancher today and try to reproduce it. Will ping you back here

docker.go Outdated
@@ -49,6 +49,11 @@ const (
packagePath = "github.com/testcontainers/testcontainers-go"

logStoppedForOutOfSyncMessage = "Stopping log consumer: Headers out of sync"

healthStatusNone = "" // default status for a container with no healthcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @emetsger ! up to you, can use here status from docker/types?

who knows maybe docker change name status in the future:)

Copy link
Author

@emetsger emetsger May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

docker/types defines types.NoHealthcheck = "none". I did not see a constant for a health check value equal to the zero-value string.

So, to be safe, I check for a zero-length string before checking the health status value:

// if a health status is present on the container, and the container is not healthy, error
if r.healthStatus != "" {
	if r.healthStatus != types.Healthy && r.healthStatus != types.NoHealthcheck {
		return fmt.Errorf("container %s is not healthy, wanted status=%s, got status=%s", resp[0].ID[:8], types.Healthy, r.healthStatus)
	}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, agree:) thanks!

@emetsger
Copy link
Author

emetsger commented May 7, 2024

Wow! @emetsger thank you so much for dedicating that amount of time to reproduce the potential bug in so many container environments. Very proud of you

It's the least I can do: trying to pay it forward!

@simonbos
Copy link

simonbos commented May 9, 2024

Thanks for the effort @emetsger !

what docker engine/version are you running on your local machine?

Docker Desktop for Mac v4.29.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants