Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Flaky tests are slowing down the process of merging #7310

Open
cdecker opened this issue May 14, 2024 · 3 comments
Open

ci: Flaky tests are slowing down the process of merging #7310

cdecker opened this issue May 14, 2024 · 3 comments

Comments

@cdecker
Copy link
Member

cdecker commented May 14, 2024

It's time to name them and shame them, as the work of the maintainer, or release captain is suffering from having to restart test runs over and over again:

Test Name Runs Failures Flakyness
test_onchain_their_unilateral_out[True] 143 57 28.50%
test_wss_proxy 164 47 22.27%
test_rbf_reconnect_tx_construct 25 6 19.35%
test_penalty_htlc_tx_timeout[True] 120 25 17.24%
test_penalty_htlc_tx_fulfill[True] 123 25 16.89%
test_penalty_outhtlc[True] 141 25 15.06%
test_penalty_rbf_normal[True] 142 25 14.97%
test_penalty_inhtlc[True] 147 25 14.53%
test_onchain_middleman_their_unilateral_in[True] 150 25 14.29%
test_onchain_timeout[True] 150 25 14.29%
test_onchain_middleman_simple[True] 152 25 14.12%
test_anchorspend_using_to_remote[True] 142 22 13.41%

If your test is listed here, please go and stabilize it, the maintainers will be thankful. If not we might have to disable the tests temporarily until they are more stable.

@daywalker90
Copy link
Contributor

Also there are still tests failing because of port binding:
Failed to bind socket for 127.0.0.1:44971: Address already in use
unfortunately every fix i tried made it worse

@cdecker
Copy link
Member Author

cdecker commented May 15, 2024

So for the port-binding we usually use the ephemeral_port_reserve package (or similar) that pre-binds a random port, and we can then take control of it by using the SO_REUSEADDR (allowing us to grab the port despite it being in a WAIT state). This means we can use the OS to distribute unique ports. We usually run into port conflicts when not using the reserve method of the package and just use a random port (without giving the OS a chance to tell us that it is already being used). Another case is when we let too much time elapse between reservation and binding (as the port loses the WAIT status and may be re-assigned), so reserve briefly before using, and reserve a new port if you can't ensure short downtime on that bind.

@rustyrussell
Copy link
Contributor

I think I fixed a test_penalty_htlc_tx_timeout[True] flake in #7364...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants