protocols/autonat: fix flaky test #2480

elenaf9 · 2022-02-05T16:14:29Z

Follow-up on #2450 (comment):
The test test_server::test_dial_error in the AutoNAT protocol fail on the GitHub CI occasionally, which I can not reproduce locally. Running the test now in a loop here so I can test if / how it can be fixed.

…ix-flaky-tests

elenaf9 · 2022-02-13T14:43:58Z

Ok, so seems like test_server::test_dial_error failed because the delays between the probes were too short. Due to that, a second probe started before the first one even resolved, which caused an unexpected event. With an increased delay the tests passed on the last 3x100 iterations. I still don't know why the one auto_probe test failed (this also happened locally a few of times), but I would say that this should be solved in a different issue / PR, as it may also be an issue of the network and not the autonat protocol.

I am still not 100% happy with the current tests because of how the delays can cause flaky tests. If I increase them even more the tests take even longer to pass, but too short delays can cause error like the one that happened in test_server::test_dial_error.

On my first drafts for AutoNAT I used to have a method for manually triggering probes. Such a method could be useful for example, as written in the (upcoming) IPFS hole-punching blogpost, if the user would like to directly trigger multiple probes on init (instead of waiting for the scheduler to trigger them after retry_interval amount of time):

B reaches out to a subset of public nodes of its peer-to-peer network, asking each node to it (B) on a set of addresses that it suspects it could be reachable under.

If we add such a method again (I am happy to do a PR for that), I could also change the tests to use this method, which would make them more deterministic. Wdyt?

…ix-flaky-tests

Handle the case that the server reports a DialResponse::Ok before the client received the event about the inbound connection.

thomaseizinger · 2022-02-13T22:24:28Z

On my first drafts for AutoNAT I used to have a method for manually triggering probes.

I get the desire to not shape the production code too much for testability if the interface is not needed otherwise. It is a trade-off but reliable tests are a pretty good argument IMO. What I've personally also done in libp2p protocols is to add specific OutEvents that merely have a "reporting" purpose but are otherwise functionally useless yet they make it tremendously easier to write tests where you want to assert that a certain thing happened.

The other alternative is to mock the concept of "time" in our tests instead of depending on a global. Given efforts like #2320, introducing a way for protocols to register time-based callbacks that are managed by libp2p-swarm might be an idea worth exploring. That could open the possibility to use a different clock in the test and thus precisely control, when a timer fires.

mxinden · 2022-02-14T20:41:57Z

The other alternative is to mock the concept of "time" in our tests instead of depending on a global. Given efforts like #2320, introducing a way for protocols to register time-based callbacks that are managed by libp2p-swarm might be an idea worth exploring. That could open the possibility to use a different clock in the test and thus precisely control, when a timer fires.

Agree that this is worth exploring. Would bring us one step further to NetworkBehaviour being a pure state machine.

mxinden · 2022-02-14T20:53:12Z

Such a method could be useful for example, as written in the (upcoming) IPFS hole-punching blogpost, if the user would like to directly trigger multiple probes on init (instead of waiting for the scheduler to trigger them after retry_interval amount of time):

B reaches out to a subset of public nodes of its peer-to-peer network, asking each node to it (B) on a set of addresses that it suspects it could be reachable under.

Oh, the author (me) wrote this in a misleading way. Instead of explaining the interval mechanisms of AutoNAT I wrongly describe this as an ad-hoc operation.

I could also change the tests to use this method, which would make them more deterministic. Wdyt?

We could as well add the method and only compile it when testing (e.g. mark it with #[cfg(test)]). Though I guess the test would then no longer test the interval logic.

If I increase them even more the tests take even longer to pass,

While obviously I would like CI to only take seconds, I favor reliable tests with large coverage over fast tests. In other words I am fine with this patch.

mxinden

This looks good to me. Thanks for all the debugging work! 🙏

…ix-flaky-tests

elenaf9 · 2022-02-15T12:10:46Z

The other alternative is to mock the concept of "time" in our tests instead of depending on a global. Given efforts like #2320, introducing a way for protocols to register time-based callbacks that are managed by libp2p-swarm might be an idea worth exploring. That could open the possibility to use a different clock in the test and thus precisely control, when a timer fires.

Definitely sounds like a good idea, but imo something that should be tackled outside of this PR. Would you mind opening a separate issue for this, so that the idea doesn't get lost after this PR is merged?

Such a method could be useful for example, as written in the (upcoming) IPFS hole-punching blogpost, if the user would like to directly trigger multiple probes on init (instead of waiting for the scheduler to trigger them after retry_interval amount of time):

B reaches out to a subset of public nodes of its peer-to-peer network, asking each node to it (B) on a set of addresses that it suspects it could be reachable under.

Oh, the author (me) wrote this in a misleading way. Instead of explaining the interval mechanisms of AutoNAT I wrongly describe this as an ad-hoc operation.

Is it really misleading? In my opinion it does make sense to run on init multiple probes in an ad-hoc fashion before deciding whether we consider our self to be public or not. A single dial-back can always fail for multiple reasons.

I could also change the tests to use this method, which would make them more deterministic. Wdyt?

We could as well add the method and only compile it when testing (e.g. mark it with #[cfg(test)]). Though I guess the test would then no longer test the interval logic.

I don't really understand the reasoning for hiding it behind a #[cfg(test)] flag. If we add this method anyway, why not then also make it available to the user?

But I agree that we should also test the interval logic. Long story short: let's just merge this fix as it is. If a method for manual probes is needed in the future it can be added in a separate PR.

mxinden · 2022-02-15T13:21:46Z

I could also change the tests to use this method, which would make them more deterministic. Wdyt?

We could as well add the method and only compile it when testing (e.g. mark it with #[cfg(test)]). Though I guess the test would then no longer test the interval logic.

I don't really understand the reasoning for hiding it behind a #[cfg(test)] flag. If we add this method anyway, why not then also make it available to the user?

To keep the interface small that we have to support across versions. E.g. not exposing it allows us to change it at any point in time without breaking anyone. I don't think we have to be dogmatic about it. Just something to keep in mind. Does that reasoning make sense?

elenaf9 added 14 commits February 5, 2022 17:08

protocols/autonat/tests: print debug messages

05d9535

.github/ci: loop autonat tests

8e8bf2e

.github/ci: remove redundant line

3334764

protocols/autonat/tests/test_server: fix dummy addrs

35bcc18

protocols/autonat/tests/test_server: fix client config

6b3a382

.github/ci: loop test max 100x

440c293

protocols/auotonat/tests/test_server: inc delay

f3893a5

protocols/auotonat/tests/test_server: fix delay

55ee957

*: loop test_client::auto_probe, print debug msges

9f98329

protocols/autonat/tests/test_client: revert changes

b6fbf27

Merge branch 'master' of github.com:libp2p/rust-libp2p into autonat/f…

88d9fec

…ix-flaky-tests

*: revert debug logs

98ee946

.github/ci: loop all autonat tests

73526b7

.github/ci: revert changes

90f26b6

elenaf9 marked this pull request as ready for review February 13, 2022 14:44

Merge branch 'master' of github.com:libp2p/rust-libp2p into autonat/f…

1da4fd3

…ix-flaky-tests

elenaf9 mentioned this pull request Feb 13, 2022

core/connection/pool: spawning a task does not inform the waker #2483

Closed

protocols/autonat/tests/test_client: fix test_auto_probe

b3c11a6

Handle the case that the server reports a DialResponse::Ok before the client received the event about the inbound connection.

mxinden approved these changes Feb 14, 2022

View reviewed changes

Merge branch 'master' of github.com:libp2p/rust-libp2p into autonat/f…

3782ae0

…ix-flaky-tests

mxinden merged commit dceb72b into libp2p:master Feb 15, 2022

elenaf9 deleted the autonat/fix-flaky-tests branch February 15, 2022 13:24

umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this pull request Mar 8, 2024

protocols/autonat: Fix flaky test (libp2p#2480)

262f0ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

protocols/autonat: fix flaky test #2480

protocols/autonat: fix flaky test #2480

elenaf9 commented Feb 5, 2022

elenaf9 commented Feb 13, 2022 •

edited

thomaseizinger commented Feb 13, 2022

mxinden commented Feb 14, 2022

mxinden commented Feb 14, 2022

mxinden left a comment

elenaf9 commented Feb 15, 2022

mxinden commented Feb 15, 2022

protocols/autonat: fix flaky test #2480

protocols/autonat: fix flaky test #2480

Conversation

elenaf9 commented Feb 5, 2022

elenaf9 commented Feb 13, 2022 • edited

thomaseizinger commented Feb 13, 2022

mxinden commented Feb 14, 2022

mxinden commented Feb 14, 2022

mxinden left a comment

Choose a reason for hiding this comment

elenaf9 commented Feb 15, 2022

mxinden commented Feb 15, 2022

elenaf9 commented Feb 13, 2022 •

edited