Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocols/autonat: fix flaky test #2480

Merged
merged 17 commits into from Feb 15, 2022

Conversation

elenaf9
Copy link
Contributor

@elenaf9 elenaf9 commented Feb 5, 2022

Follow-up on #2450 (comment):
The test test_server::test_dial_error in the AutoNAT protocol fail on the GitHub CI occasionally, which I can not reproduce locally. Running the test now in a loop here so I can test if / how it can be fixed.

@elenaf9
Copy link
Contributor Author

elenaf9 commented Feb 13, 2022

Ok, so seems like test_server::test_dial_error failed because the delays between the probes were too short. Due to that, a second probe started before the first one even resolved, which caused an unexpected event. With an increased delay the tests passed on the last 3x100 iterations. I still don't know why the one auto_probe test failed (this also happened locally a few of times), but I would say that this should be solved in a different issue / PR, as it may also be an issue of the network and not the autonat protocol.


I am still not 100% happy with the current tests because of how the delays can cause flaky tests. If I increase them even more the tests take even longer to pass, but too short delays can cause error like the one that happened in test_server::test_dial_error.

On my first drafts for AutoNAT I used to have a method for manually triggering probes. Such a method could be useful for example, as written in the (upcoming) IPFS hole-punching blogpost, if the user would like to directly trigger multiple probes on init (instead of waiting for the scheduler to trigger them after retry_interval amount of time):

B reaches out to a subset of public nodes of its peer-to-peer network, asking each node to it (B) on a set of addresses that it suspects it could be reachable under.

If we add such a method again (I am happy to do a PR for that), I could also change the tests to use this method, which would make them more deterministic. Wdyt?

@elenaf9 elenaf9 marked this pull request as ready for review February 13, 2022 14:44
Handle the case that the server reports a DialResponse::Ok before the
client received the event about the inbound connection.
@thomaseizinger
Copy link
Contributor

On my first drafts for AutoNAT I used to have a method for manually triggering probes.

I get the desire to not shape the production code too much for testability if the interface is not needed otherwise. It is a trade-off but reliable tests are a pretty good argument IMO. What I've personally also done in libp2p protocols is to add specific OutEvents that merely have a "reporting" purpose but are otherwise functionally useless yet they make it tremendously easier to write tests where you want to assert that a certain thing happened.

The other alternative is to mock the concept of "time" in our tests instead of depending on a global. Given efforts like #2320, introducing a way for protocols to register time-based callbacks that are managed by libp2p-swarm might be an idea worth exploring. That could open the possibility to use a different clock in the test and thus precisely control, when a timer fires.

@mxinden
Copy link
Member

mxinden commented Feb 14, 2022

The other alternative is to mock the concept of "time" in our tests instead of depending on a global. Given efforts like #2320, introducing a way for protocols to register time-based callbacks that are managed by libp2p-swarm might be an idea worth exploring. That could open the possibility to use a different clock in the test and thus precisely control, when a timer fires.

Agree that this is worth exploring. Would bring us one step further to NetworkBehaviour being a pure state machine.

@mxinden
Copy link
Member

mxinden commented Feb 14, 2022

Such a method could be useful for example, as written in the (upcoming) IPFS hole-punching blogpost, if the user would like to directly trigger multiple probes on init (instead of waiting for the scheduler to trigger them after retry_interval amount of time):

B reaches out to a subset of public nodes of its peer-to-peer network, asking each node to it (B) on a set of addresses that it suspects it could be reachable under.

Oh, the author (me) wrote this in a misleading way. Instead of explaining the interval mechanisms of AutoNAT I wrongly describe this as an ad-hoc operation.

I could also change the tests to use this method, which would make them more deterministic. Wdyt?

We could as well add the method and only compile it when testing (e.g. mark it with #[cfg(test)]). Though I guess the test would then no longer test the interval logic.

If I increase them even more the tests take even longer to pass,

While obviously I would like CI to only take seconds, I favor reliable tests with large coverage over fast tests. In other words I am fine with this patch.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks for all the debugging work! 🙏

@elenaf9
Copy link
Contributor Author

elenaf9 commented Feb 15, 2022

The other alternative is to mock the concept of "time" in our tests instead of depending on a global. Given efforts like #2320, introducing a way for protocols to register time-based callbacks that are managed by libp2p-swarm might be an idea worth exploring. That could open the possibility to use a different clock in the test and thus precisely control, when a timer fires.

Definitely sounds like a good idea, but imo something that should be tackled outside of this PR. Would you mind opening a separate issue for this, so that the idea doesn't get lost after this PR is merged?


Such a method could be useful for example, as written in the (upcoming) IPFS hole-punching blogpost, if the user would like to directly trigger multiple probes on init (instead of waiting for the scheduler to trigger them after retry_interval amount of time):

B reaches out to a subset of public nodes of its peer-to-peer network, asking each node to it (B) on a set of addresses that it suspects it could be reachable under.

Oh, the author (me) wrote this in a misleading way. Instead of explaining the interval mechanisms of AutoNAT I wrongly describe this as an ad-hoc operation.

Is it really misleading? In my opinion it does make sense to run on init multiple probes in an ad-hoc fashion before deciding whether we consider our self to be public or not. A single dial-back can always fail for multiple reasons.

I could also change the tests to use this method, which would make them more deterministic. Wdyt?

We could as well add the method and only compile it when testing (e.g. mark it with #[cfg(test)]). Though I guess the test would then no longer test the interval logic.

I don't really understand the reasoning for hiding it behind a #[cfg(test)] flag. If we add this method anyway, why not then also make it available to the user?

But I agree that we should also test the interval logic. Long story short: let's just merge this fix as it is. If a method for manual probes is needed in the future it can be added in a separate PR.

@mxinden mxinden merged commit dceb72b into libp2p:master Feb 15, 2022
@mxinden
Copy link
Member

mxinden commented Feb 15, 2022

I could also change the tests to use this method, which would make them more deterministic. Wdyt?

We could as well add the method and only compile it when testing (e.g. mark it with #[cfg(test)]). Though I guess the test would then no longer test the interval logic.

I don't really understand the reasoning for hiding it behind a #[cfg(test)] flag. If we add this method anyway, why not then also make it available to the user?

To keep the interface small that we have to support across versions. E.g. not exposing it allows us to change it at any point in time without breaking anyone. I don't think we have to be dogmatic about it. Just something to keep in mind. Does that reasoning make sense?

@elenaf9 elenaf9 deleted the autonat/fix-flaky-tests branch February 15, 2022 13:24
umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this pull request Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants