chore(client): retry api version discovery for at least 30 seconds #5290

dpc · 2024-05-14T21:58:22Z

During debugging with joschisan, we've spotted a case where in a test the Federation was not yet fully up (peer api servers were down), yet the test Client was already attempting to join with a pre-prepared config.

Normally in this case fetching the config would fail, but that was passed straight from the server fixture, so api discovery call was actually the first call. It queried all peers, all were down, the request logic would re-attempt connection but only once, which instantly failed as well, failing the whole initial api version discovery and disabling all client modules.

While in this case it seems like a problem of test failing to setup Federation before attempting to run client code, it makes me think that in real life there might be cases where the client is joining new Federation, gets a config, and then e.g. switches from wifi to GSM, etc. and has a brief period of being offline.

Again - this is only ever a problem on a first time client is joining. But to prevent such a hiccups, in this change, I'm making it so the client will attempt at least for a certain period to discover the version before giving it up, to lower the chances of such an unfortunate UX.

elsirion · 2024-05-16T08:43:32Z

Needs rebase :(

elsirion · 2024-05-16T16:50:47Z

Will this fix #5286?

dpc · 2024-05-16T19:08:34Z

Will this fix #5286?

I believe something else already fixed it in @joschisan's PR. But if it had been there, it would probably also prevented it.

During debugging with joschisan, we've spotted a case where in a test the Federation was not yet fully up (peer api servers were down), yet the test Client was already attempting to join with a pre-prepared config. Normally in this case fetching the config would fail, but that was passed straight from the server fixture, so api discovery call was actually the first call. It queried all peers, all were down, the request logic would re-attempt connection but only once, which instantly failed as well, failing the whole initial api version discovery and disabling all client modules. While in this case it seems like a combination of test failing to setup Federation before attempting to run client code, it makes me think that in real life there might be cases where the client is joining new Federation, gets a config, and then e.g. switches from wifi to GSM, etc. and has a brief period of being offline. Again - this is only ever a problem on a first time client is joining. But to prevent such a hiccups, in this change, I'm making it so the client will attempt at least for a certain period to discover the version before giving it up, to lower the chances of such an unfortunate UX.

joschisan · 2024-05-16T20:17:01Z

@dpc I did not fix this in general, just prevented this by awaiting the api to come online.

dpc · 2024-05-16T20:21:31Z

@dpc I did not fix this in general, just prevented this by awaiting the api to come online.

Yeah The client can't join federation if it's not online yet. Waiting for it in the test fixture is the right fix, IMO.

elsirion · 2024-05-19T09:32:34Z

fedimint-client/src/lib.rs

+        let deadline = now() + Duration::from_secs(30);
+
+        loop {
+            let res =
+                Self::discover_common_api_version_static_try(config, client_module_init, api, mode)
+                    .await;
+
+            if res.is_ok() {
+                return res;
+            }
+
+            if deadline < now() {
+                return res;
+            }
+            debug!(target: LOG_CLIENT, "Retrying getting api version from Federation");
+            sleep(Duration::from_secs(1)).await;
+        }


dpc requested review from a team as code owners May 14, 2024 21:58

elsirion previously approved these changes May 16, 2024

View reviewed changes

dpc added 2 commits May 16, 2024 12:58

chore: timing logs are not all that important anymore

784b97c

dpc dismissed elsirion’s stale review via 784b97c May 16, 2024 19:59

dpc force-pushed the 24-05-14-retry-version-discovery-longer branch from d80fc13 to 784b97c Compare May 16, 2024 19:59

dpc requested a review from elsirion May 16, 2024 20:00

dpc enabled auto-merge May 18, 2024 00:02

elsirion approved these changes May 19, 2024

View reviewed changes

dpc added this pull request to the merge queue May 19, 2024

Merged via the queue into fedimint:master with commit 212bc13 May 19, 2024
21 checks passed

dpc deleted the 24-05-14-retry-version-discovery-longer branch May 19, 2024 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(client): retry api version discovery for at least 30 seconds #5290

chore(client): retry api version discovery for at least 30 seconds #5290

dpc commented May 14, 2024 •

edited

elsirion commented May 16, 2024

elsirion commented May 16, 2024

dpc commented May 16, 2024

joschisan commented May 16, 2024

dpc commented May 16, 2024 •

edited

elsirion May 19, 2024

chore(client): retry api version discovery for at least 30 seconds #5290

chore(client): retry api version discovery for at least 30 seconds #5290

Conversation

dpc commented May 14, 2024 • edited

elsirion commented May 16, 2024

elsirion commented May 16, 2024

dpc commented May 16, 2024

joschisan commented May 16, 2024

dpc commented May 16, 2024 • edited

elsirion May 19, 2024

Choose a reason for hiding this comment

dpc commented May 14, 2024 •

edited

dpc commented May 16, 2024 •

edited