Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging backpressure #584

Open
zilm13 opened this issue Jun 30, 2022 · 4 comments
Open

Logging backpressure #584

zilm13 opened this issue Jun 30, 2022 · 4 comments

Comments

@zilm13
Copy link
Contributor

zilm13 commented Jun 30, 2022

Hello
I'm trying to run Hive Testsuite with high log level, DEBUG or TRACE and see huge performance degradation compared to INFO level. Not only the participation level drops from 0.97 to 0.90-0.93 but also client fails to reply to the requests from Testsuite to RPC API in 5 seconds timeout. As lower logging level works well with the same setup and logs for unresponsive container are stopped few seconds before the failure, I guess that the reason behind that is logging back pressure. Docker log output to stdout and stderr is blocking by default and when it unables to flush it in time, container stalls and resumes only after output is fetched (see docs). It's a highly repeatable issue for me with high log levels.
I've tried to set docker logging to "non-blocking" mode but it doesn't help. Maybe the reason is elsewhere or I'm doing something wrong by patching this:

	c, err := b.client.CreateContainer(docker.CreateContainerOptions{
		Context: ctx,
		Config: &docker.Config{
			Image: imageName,
			Env:   vars,
		},
		HostConfig: &docker.HostConfig{
			LogConfig: docker.LogConfig{
				Config: map[string]string{
					"mode":            "non-blocking",
					"max-buffer-size": "4m",
				},
			},
		},
	})

Debugging tests with higher log levels looks valuable, but with blocking logging test fails early.
Could you, please, help to make it working?

@fjl
Copy link
Collaborator

fjl commented Jul 1, 2022

Hive captures client logs by attaching to the container via Docker API. I think docker does not / cannot apply this buffering to the attached connection. I will look into it.

@fjl
Copy link
Collaborator

fjl commented Jul 1, 2022

Yes, it looks like the issue is that hive does not use the equivalent of 'docker logs' to get container logs. Instead, we attach to the container before it starts to capture all stdout/stderr.

I think we could change to grabbing the logs asynchronously for client containers using a different API.

@fjl
Copy link
Collaborator

fjl commented Mar 1, 2023

Prototype implementation is here: #735

@fjl
Copy link
Collaborator

fjl commented Mar 2, 2023

I cannot reproduce the original issue, my machine is too fast. I use this command to test:

./hive --sim eth2/testnet --client teku-bn,teku-vc,besu --sim.loglevel 5

The testnet participation is 0.97 and the PR makes no difference on the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants