Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(client): retry api version discovery for at least 30 seconds #5290

Merged
merged 2 commits into from
May 19, 2024

Conversation

dpc
Copy link
Contributor

@dpc dpc commented May 14, 2024

During debugging with joschisan, we've spotted a case where in a test the Federation was not yet fully up (peer api servers were down), yet the test Client was already attempting to join with a pre-prepared config.

Normally in this case fetching the config would fail, but that was passed straight from the server fixture, so api discovery call was actually the first call. It queried all peers, all were down, the request logic would re-attempt connection but only once, which instantly failed as well, failing the whole initial api version discovery and disabling all client modules.

While in this case it seems like a problem of test failing to setup Federation before attempting to run client code, it makes me think that in real life there might be cases where the client is joining new Federation, gets a config, and then e.g. switches from wifi to GSM, etc. and has a brief period of being offline.

Again - this is only ever a problem on a first time client is joining. But to prevent such a hiccups, in this change, I'm making it so the client will attempt at least for a certain period to discover the version before giving it up, to lower the chances of such an unfortunate UX.

@dpc dpc requested review from a team as code owners May 14, 2024 21:58
elsirion
elsirion previously approved these changes May 16, 2024
@elsirion
Copy link
Contributor

Needs rebase :(

@elsirion
Copy link
Contributor

Will this fix #5286?

@dpc
Copy link
Contributor Author

dpc commented May 16, 2024

Will this fix #5286?

I believe something else already fixed it in @joschisan's PR. But if it had been there, it would probably also prevented it.

dpc added 2 commits May 16, 2024 12:58
During debugging with joschisan, we've spotted a case where in a test
the Federation was not yet fully up (peer api servers were down), yet
the test Client was already attempting to join with a pre-prepared
config.

Normally in this case fetching the config would fail, but that
was passed straight from the server fixture, so api discovery
call was actually the first call. It queried all peers, all were
down, the request logic would re-attempt connection but only once,
which instantly failed as well, failing the whole initial api
version discovery and disabling all client modules.

While in this case it seems like a combination of test failing to
setup Federation before attempting to run client code, it makes
me think that in real life there might be cases where the client
is joining new Federation, gets a config, and then e.g. switches
from wifi to GSM, etc. and has a brief period of being offline.

Again - this is only ever a problem on a first time client is
joining. But to prevent such a hiccups, in this change,
I'm making it so the client will attempt at least for a certain
period to discover the version before giving it up, to lower
the chances of such an unfortunate UX.
@dpc dpc force-pushed the 24-05-14-retry-version-discovery-longer branch from d80fc13 to 784b97c Compare May 16, 2024 19:59
@dpc dpc requested a review from elsirion May 16, 2024 20:00
@joschisan
Copy link
Contributor

@dpc I did not fix this in general, just prevented this by awaiting the api to come online.

@dpc
Copy link
Contributor Author

dpc commented May 16, 2024

@dpc I did not fix this in general, just prevented this by awaiting the api to come online.

Yeah The client can't join federation if it's not online yet. Waiting for it in the test fixture is the right fix, IMO.

@dpc dpc enabled auto-merge May 18, 2024 00:02
Comment on lines +1406 to +1422
let deadline = now() + Duration::from_secs(30);

loop {
let res =
Self::discover_common_api_version_static_try(config, client_module_init, api, mode)
.await;

if res.is_ok() {
return res;
}

if deadline < now() {
return res;
}
debug!(target: LOG_CLIENT, "Retrying getting api version from Federation");
sleep(Duration::from_secs(1)).await;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: #5323

@dpc dpc added this pull request to the merge queue May 19, 2024
Merged via the queue into fedimint:master with commit 212bc13 May 19, 2024
21 checks passed
@dpc dpc deleted the 24-05-14-retry-version-discovery-longer branch May 19, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants