-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrying a subscription does not renew the id and may cause an error on the server because the id is already used #5217
Comments
Looks like your backend is returning {
"data": null,
"errors": [
{
"message": "Name for character with ID 1002 could not be fetched.",
"locations": [{ "line": 6, "column": 7 }],
"path": ["hero", "heroFriends", 1, "name"]
}
]
} This should be fixed in your backend. I also agree this should be recoverable from the client side of things. Did you try catching errors in your Flow? Something like apolloClient.subscription(subscription)
.toFlow()
.catch {
// recover here
} |
Hey @martinbonnin thanks for quick reply. I haven't tried |
Yes, instead of a crash, you will have an
We typically release ~1 a month these days but there is no fixed release schedule. You can already try in the SNAPSHOTs though |
hey @martinbonnin My team was looking into the crash from the backend side and apparently it was same issue as this one hasura/graphql-engine#3564 We are using Hasura. If we understand it correctly, it was the client trying to connect with the same operationId. This 👇 what I got from the backend team. Replaced some data with {
"detail": {
"connection_info": {
"msg": null,
"token_expiry": "2023-08-31T20:09:54Z",
"websocket_id": "<removed>"
},
"event": {
"detail": {
"operation_id": "cbeb464f-f79<removed>",
"operation_name": "SomeOperation",
"operation_type": {
"detail": "an operation already exists with this id: <removed>",
"type": "proto_err"
},
"parameterized_query_hash": null,
"query": {
"operationName": "SomeOperation",
"query": "subscription <removed>"
},
"request_id": null
},
"type": "operation"
},
"user_vars": {
"x-hasura-role": "user",
"x-hasura-user-id": "<removed>"
}
},
"level": "error",
"timestamp": "2023-08-31T20:00:30.083+0000",
"type": "websocket-log"
} I am letting you know because maybe there is another issue with Apollo SDK that you would like to investigate. |
Thanks for the follow up! {
"detail": { ... },
"level": "error",
"timestamp": "2023-08-31T20:00:30.083+0000",
"type": "websocket-log"
} Is that the content of a WebSocket message? Or the contents of the "errors" field? The {
id: '<unique-operation-id>';
type: 'error';
payload: GraphQLError[];
} And a GraphQLError is defined here All in all, we can make it so that Apollo Kotlin catches these errors and exposes them in |
@martinbonnin hey, thanks for your info. I just managed to reproduce this error. I was using single collector in my view model that use flow from subscription. Now, I tried to setup second collector on the same flow and that error appeared. Is it not allowed to have multiple collectors? |
Excellent question. From a quick look at the code, collecting the same The quick fix is to re-use your val call = apolloClient.subscription(subscription)
launch {
// first collector
call.toFlow().collect { ... }
}
launch {
// second collector
call.toFlow().collect { ... }
} This will make sure a new |
thanks @martinbonnin can't do that unfortunately because we use KKM and I have Apollo in shared module exposing |
hey @martinbonnin it might be SDK issue, not sure but I am playing a bit with subscriptions and turning off and on network to see how reconnection behave. I noticed this error
This is thrown after I turned on the internet back. I have a single subscription running and that error is from EDIT: |
Hey @martinbonnin how are you doing? I was wondering how are you doing with the fix for this issue. I see there is a draft PR already but it's pretty old. Can we expect some progress any time soon on this? |
@damianpetla apologies for the super late response here. This ended up being quite the rabbit hole... The tlrd; of that rabbit hole is that retrying at the WebSocket like was previously done has major limitations as you found out:
All in all, ApolloClient.Builder()
.serverUrl(sampleServer.graphqlUrl())
.retryOnError { it.operation is Subscription }
.subscriptionNetworkTransport(
WebSocketNetworkTransport.Builder()
.serverUrl(sampleServer.subscriptionsUrl())
.build()
)
.build() This code also uses an incubating dependencies {
implementation("com.apollographql.apollo3:apollo-websocket-network-transport-incubating")
} For more details, you can also check the integration test here The current plan is to replace the default implementation of |
Hey @martinbonnin Thanks for update, I was looking for those changes. No worries about delay. We get this SDK for free so appreciate any feedback and support 😄 I have updated to beta5 now and added retryOnError. Will see how that works. Regarding incubating, I wouldn't mind to try that as well but would appreciate some hints on it. We use it on prod so don't want to take too much risk with experimentation🙈 Here is how we setup apollo client currently: ApolloClient.Builder()
.networkTransport(
HttpNetworkTransport.Builder()
.serverUrl(applicationInfo.hasuraUrl)
.httpEngine(KtorHttpEngine(httpClient))
.build()
)
.retryOnError { it.operation is Subscription }
.subscriptionNetworkTransport(
WebSocketNetworkTransport.Builder()
.webSocketEngine(KtorWebSocketEngine(httpClient))
.protocol(GraphQLWsProtocol.Factory())
.serverUrl(applicationInfo.hasuraUrl)
/**
* ApolloClient on logout stop subscriptions and wait by default 60 sec
* until connection is closed. Below function change that to 5 sec
* so another user has no chance to login and re-use previous connection.
*/
.idleTimeoutMillis(5000)
.reopenWhen { throwable, attempt ->
log.i(
throwable = throwable,
messageString = "Reopen when attempt $attempt"
)
if (throwable is ApolloWebSocketClosedException) {
log.w(
throwable = throwable,
messageString = "Web socket closed"
)
}
/**
* Power of 13 attempts give ~5 sec delay.
* If it's higher attempt we keep 5 sec delay max.
*/
exponentialDelay(attempt)
}
.build()
)
.build() Things that I would need to address:
Additionally, you can see that I am reducing idle timeout. I discovered that re-login user kept using previous connection. Do you know if there is a better way to invalidate connection? Thanks! |
Hey thanks for the follow up!
Indeed the
TBH I'm not sure why this needed to be a factory.
Exactly. I realized recently that managing the In general, I like to see the Using As a nice bonus,
Right. That's actually the (big) difference between a GraphQL |
hey @martinbonnin I have played a bit with Btw, I also noticed that on individual operators I could call |
Oooh my..., the KDoc for
The actual logic is done in RetryOnErrorInterceptor. It leverages coroutines to retry the Hope this helps, since this is all pretty new there's not a lot of docs but I'll get working on this (edit: KDoc PR here). |
@damianpetla |
hey @martinbonnin I was trying it out but don't know what to do with |
Right, this part is still needed... We're actually looking into breaking down the apollo-kotlin repo in smaller ones. |
@damianpetla gist is here. I have tested it passes all the WebSocketNetworkTransportTest tests except one little corner case where it throws |
hey @martinbonnin I have tried your gist but for some reason subscriptions stopped emitting any data. I have logged the engine and I see frames being send and received. |
Can you upload a small reproducer somewhere? |
Not sure really how at the moment, building something on a side would probably take too much time. I might look into the code more later and try debugging more. Swamped with work currently 😓 Even trying it was a bit stretching but not giving up yet so don't worry :) |
I hear you, thanks for looking into this! WebSockets are hard because the transport isn't specified so there are a lot of variants out there. To make things worse, they are very often behind authentication, making it really hard to reproduce the issues. I've uploaded a small playground here that uses |
Version
4.0.0-alpha.2
Summary
I have noticed few crashes of my app when using subscription. It looks like SDK have problem with parsing errors. I am not sure exactly why. I have contacted my backend team but maybe letting you know can resolve that issue faster.
My setup of ApolloClient looks like this
and subscription like this
Just before the crash
reopenWhen
set onWebScoketNetworkTransport
was fired once with attempt0
Is there something I could do to prevent app from crashing?
Steps to reproduce the behavior
No response
Logs
The text was updated successfully, but these errors were encountered: