Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky optimism p2p::can_sync test #8166

Open
Rjected opened this issue May 8, 2024 · 6 comments
Open

Flaky optimism p2p::can_sync test #8166

Rjected opened this issue May 8, 2024 · 6 comments
Labels
A-op-reth Related to Optimism and op-reth C-bug An unexpected or incorrect behavior C-test A change that impacts how or what we test D-good-first-issue Nice and easy! A great choice to get started

Comments

@Rjected
Copy link
Member

Rjected commented May 8, 2024

A recent merge queue run failed in a way that indicates we have something flaky in p2p::can_sync:
https://github.com/paradigmxyz/reth/actions/runs/9007839847/job/24748566483

The important logs:

2024-05-08T20:09:21.442257Z ERROR reth_node_builder::launch::common: Failed to build global thread pool: ThreadPoolBuildError { kind: GlobalPoolAlreadyInitialized }
2024-05-08T20:09:26.441518Z ERROR reth_node_builder::launch::common: Failed to build global thread pool: ThreadPoolBuildError { kind: GlobalPoolAlreadyInitialized }
2024-05-08T20:09:44.902706Z ERROR blockchain_tree: Reverting canonical chain failed with error: OptimisticTargetRevert(88)

We might need something lazy for the thread pool, or it might have to do with launching multiple nodes? Have not debugged this in depth

The error is thrown here:

/// Configure global settings this includes:
///
/// - Raising the file descriptor limit
/// - Configuring the global rayon thread pool
pub fn configure_globals(&self) {
// Raise the fd limit of the process.
// Does not do anything on windows.
let _ = fdlimit::raise_fd_limit();
// Limit the global rayon thread pool, reserving 2 cores for the rest of the system
let _ = ThreadPoolBuilder::new()
.num_threads(
available_parallelism().map_or(25, |cpus| max(cpus.get().saturating_sub(2), 2)),
)
.build_global()
.map_err(|e| error!("Failed to build global thread pool: {:?}", e));
}

@Rjected Rjected added C-bug An unexpected or incorrect behavior C-test A change that impacts how or what we test A-op-reth Related to Optimism and op-reth labels May 8, 2024
@mattsse
Copy link
Collaborator

mattsse commented May 8, 2024

we can simply remove the error log here

@mattsse mattsse added the D-good-first-issue Nice and easy! A great choice to get started label May 8, 2024
@Rjected
Copy link
Member Author

Rjected commented May 8, 2024

we can simply remove the error log here

The test is timing out though, so we need to do something else to fix the test probably

@onbjerg
Copy link
Member

onbjerg commented May 8, 2024

This is because you can't have more than one threadpool per process, I ran into this with one of the benchmarks we hadn't kept up to date, too. Not entirely sure what the best way to circumvent this is

Edit: Actually this is a bit different, but might be a similar error. I ran into this with Tokio (the benchmark tool spawned a tokio runtime, after which the function we called did the same)

@Rjected
Copy link
Member Author

Rjected commented May 8, 2024

Actually the flake might be more related to

2024-05-08T20:09:44.902706Z ERROR blockchain_tree: Reverting canonical chain failed with error: OptimisticTargetRevert(88)

But I still need to repro

@onbjerg
Copy link
Member

onbjerg commented May 8, 2024

Yeah reading the threadpoolbuilder docs, it returns an error but does not mention anything about it being fatal or stalling, so should be ok: https://docs.rs/rayon/latest/rayon/struct.ThreadPoolBuilder.html#method.build_global

@joshieDo
Copy link
Collaborator

joshieDo commented May 9, 2024

That specific error log is expected and part of the e2e test @Rjected
so I'm thinking it's really about the long duration of the test (20s if i recall) unsure now on why its timing out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-op-reth Related to Optimism and op-reth C-bug An unexpected or incorrect behavior C-test A change that impacts how or what we test D-good-first-issue Nice and easy! A great choice to get started
Projects
Status: Todo
Development

No branches or pull requests

4 participants