New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protocols/autonat: fix flaky test #2480
Conversation
Ok, so seems like I am still not 100% happy with the current tests because of how the delays can cause flaky tests. If I increase them even more the tests take even longer to pass, but too short delays can cause error like the one that happened in On my first drafts for AutoNAT I used to have a method for manually triggering probes. Such a method could be useful for example, as written in the (upcoming) IPFS hole-punching blogpost, if the user would like to directly trigger multiple probes on init (instead of waiting for the scheduler to trigger them after
If we add such a method again (I am happy to do a PR for that), I could also change the tests to use this method, which would make them more deterministic. Wdyt? |
Handle the case that the server reports a DialResponse::Ok before the client received the event about the inbound connection.
I get the desire to not shape the production code too much for testability if the interface is not needed otherwise. It is a trade-off but reliable tests are a pretty good argument IMO. What I've personally also done in libp2p protocols is to add specific The other alternative is to mock the concept of "time" in our tests instead of depending on a global. Given efforts like #2320, introducing a way for protocols to register time-based callbacks that are managed by |
Agree that this is worth exploring. Would bring us one step further to |
Oh, the author (me) wrote this in a misleading way. Instead of explaining the interval mechanisms of AutoNAT I wrongly describe this as an ad-hoc operation.
We could as well add the method and only compile it when testing (e.g. mark it with
While obviously I would like CI to only take seconds, I favor reliable tests with large coverage over fast tests. In other words I am fine with this patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thanks for all the debugging work! 🙏
Definitely sounds like a good idea, but imo something that should be tackled outside of this PR. Would you mind opening a separate issue for this, so that the idea doesn't get lost after this PR is merged?
Is it really misleading? In my opinion it does make sense to run on init multiple probes in an ad-hoc fashion before deciding whether we consider our self to be public or not. A single dial-back can always fail for multiple reasons.
I don't really understand the reasoning for hiding it behind a But I agree that we should also test the interval logic. Long story short: let's just merge this fix as it is. If a method for manual probes is needed in the future it can be added in a separate PR. |
To keep the interface small that we have to support across versions. E.g. not exposing it allows us to change it at any point in time without breaking anyone. I don't think we have to be dogmatic about it. Just something to keep in mind. Does that reasoning make sense? |
Follow-up on #2450 (comment):
The test
test_server::test_dial_error
in the AutoNAT protocol fail on the GitHub CI occasionally, which I can not reproduce locally. Running the test now in a loop here so I can test if / how it can be fixed.