Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble getting websocket to reopen for iOS or Android in KMM project #5633

Closed
mboyd1993 opened this issue Feb 21, 2024 · 10 comments
Closed

Comments

@mboyd1993
Copy link
Contributor

mboyd1993 commented Feb 21, 2024

Question

I'm using Apollo in a multiplatform project for iOS and Android along with AWS AppSync with websocket communication. I can't get the websocket to reopen correctly on either platform after a temporary loss of internet connection.

I've setup several subscriptions using the AppSyncWsProtocol and everything seems to work correctly on both platforms. I can subscribe from either app and I receive subscription events as expected while the app is connected to the internet. However, if the app loses an internet connection, the websocket doesn't seem to get reopened as expected when internet is restored. I've experienced different issues on iOS and Android. On Android, I seem to have found a workaround, although I'm not sure if it will work in all cases.

I've defined the .reopenWhen block when initializing the WebSocketNetworkTransport and I'm also using the .retryWhen block on my subscription Flows. I'll first describe the issue I've faced with iOS, then I'll describe the issue I've faced on Android and my workaround for it.

Here's an example of code in the shared module for initializing the ApolloClient and creating a subscription.

// Create a WebSocketNetworkTransport object to use to initialize the ApolloClient
val webSocketNetworkTransport = WebSocketNetworkTransport.Builder()
    .serverUrl(serverUrl = url)
    .protocol(protocolFactory = AppSyncWsProtocol.Factory(authorization = authorization, connectionAcknowledgeTimeoutMs = 50000))
    .reopenWhen { cause, attempt ->
        println("Offline test: .reopenWhen called with cause = $cause and attempt = $attempt")
        delay(2.0.pow(attempt.toDouble()).toLong())
        // retry after the delay
        true
    }
    .build()

// Initialize the ApolloClient
apolloClient = ApolloClient.Builder()
    .serverUrl(serverAddress)
    .interceptors(listOf(AuthorizationInterceptor(authToken), ErrorMessageInterceptor(ebAppContext)))
    .subscriptionNetworkTransport(subscriptionNetworkTransport = webSocketNetworkTransport)
    .build()
private fun userExitedSubscribe(sessionId: String): Flow<ExitSessionResponse?> {
    return apolloClient.subscription(UserExitedSubscription(sessionId)).toFlow()
        .map { it.data?.userExited?.exitSessionResponse }
        .retryWhen { cause, attempt -> 
              println("Offline test: retryWhen called with attempt = $attempt")
              delay(60000L)
              // retry after the delay
              return true
         }
}

From my reading of the source code, it seems that the .reopenWhen block should be executed repeatedly as long as it continues to return true and the websocket is returning a NetworkError. However, this isn't the behavior that I've observed on the iOS or Android apps. On iOS, if I return true from .reopenWhen, it's only called one single time and the websocket never reconnects after the internet connection is restored. If I return false from .reopenWhen, the error is passed along to the subscription Flow which executes it's .retryWhen block one single time and nothing else ever happens. However, if I return false from .reopenWhen and then reestablish the internet connection before the delay in the .retryWhen block expires, then the websocket will reconnect and the Flow will start emitting again.

On Android, I've been able to get the websocket to reconnect and get the subscription Flows to start emitting events again only by using a specific combination of .reopenWhen and .retryWhen. What I observed is that if .reopenWhen returns true even one single time while the internet is still disconnected, then it will keep getting called endlessly with an UnknownHostException over and over even after the internet is restored and the websocket will never reopen. If I use this crude workaround and only return true if I can successfully execute a query, then I can ultimately get the websocket to reopen.

.reopenWhen { cause, attempt ->
    println("Offline test: .reopenWhen called with cause = $cause and attempt = $attempt")
    delay(2.0.pow(attempt.toDouble()).toLong())
    // retry after the delay

    try {
        val response = apolloClient.query(MyQuery()).execute()
        true
    } catch(exception: ApolloException) {
        false
    }
}

Returning false from .reopenWhen passes the error along to the subscription Flow. The .retryWhen block on the subscription Flow executes which eventually causes .reopenWhen to get called again in a loop until the internet is connected again, at which time, .reopenWhen will return true, the websocket will get reconnected, and the subscription Flow will start emitting again.

So, I don't know if what I'm experiencing on iOS and/or Android is a bug or if I'm misunderstanding how I'm supposed to setup the ApolloClient to be able to gracefully reopen the websocket, but I need to be able to have my subscription Flows survive temporary losses of network connection. Does anyone have an example of using Apollo with AWS AppSync and subscription Flows where the reopening of the websocket is working?

Tasks

No tasks being tracked yet.
@martinbonnin
Copy link
Contributor

Hi 👋 Thanks for sending this.

Not getting a reopenWhen {} callback when the network goes up again is weird. I could verify here that it works in simple cases with delay(2000); true

      .reopenWhen { throwable, attempt ->
          println("Reopening...")
          delay(2000)
          true
      }

I uploaded the reproducer here if you want to try.

Is there any chance you can run your subscription through a proxy such as Charles? That would give more information what is happening.

@mboyd1993
Copy link
Contributor Author

Hey @martinbonnin, thanks for the help. I checked out your test project and it works correctly for me too. .reopenWhen keeps getting called until the network connection is reestablished and then the websocket is reopened and the subscription Flow starts emitting again. I do see that same behavior in my Android app as well, but it's my iOS app where .reopenWhen is only getting called one single time. So I don't know if it's something in the apollo library or in iOS itself, but something is different. The only difference I've been able to see so far is that when .reopenWhen gets called the first time on Android, the cause is java.net.SocketException. When it gets called on iOS the one time, the cause is ApolloNetworkException.

And I did try to run Charles Proxy on iOS, but for some reason when I have the proxy connected, the websocket doesn't work at all. No websocket connection shows up and my app doesn't receive any subscription events. It's possible that I don't have something configured correctly in the Charles Proxy app, but I can see all the queries and mutations correctly.

@martinbonnin
Copy link
Contributor

I added an iOS app to the reproducer (commit) and I can see the difference. Interestingly, I did not even get a call to reopenWhen {}. Looks like the simulator has maybe some retry backed in or something like this.

When I enable wifi again, the subscription resumes though. Can you try with the sample, see if you can replicate your issue?

Screen.Recording.2024-02-23.at.13.17.11.mov

@mboyd1993
Copy link
Contributor Author

Thanks for adding an iOS project. For some reason, I'm not able to build it in Xcode on a device or in the simulator. Xcode won't let me select a device or simulator as if I don't have one that's compatible. It says "Any iOS Device (arm64)", but I have a new device running iOS 17.2 and the latest simulators. I'm using an Intel Mac still, so I don't know if that has something to do with it. Honestly, I didn't debug that very long because I assume if your sample project works for you it would work for me as well. I instead tried to replicate your setup within my own app. I wasn't previously using the SKIE library, and I was previously collecting the subscription Flows differently, so I changed that. I imported the SKIE library and changed to using that async/await pattern to collect the subscription Flow directly. Unfortunately, I see the same exact behavior. I'm able to receive subscription events initially, but when I disable the network connection, .reopenWhen gets called one single time with ApolloNetworkException, and the subscription Flow never starts emitting values again after the network is reconnected. I'm trying to think of anything that's different in my app from your demo. The only things that come to mind are that I'm running on a device and not the simulator and I'm using AppSyncWsProtocol instead of the default web socket protocol. Do you think either of those things could matter? Or could something else be different?

@mboyd1993
Copy link
Contributor Author

mboyd1993 commented Feb 25, 2024

Hey @martinbonnin, I got your demo iOS project running tonight. Just had to change the minimum deployment target to 17.0 for some reason even though my device is running 17.2. Regardless, I was able to confirm the same behavior you demonstrated on the simulator. The subscription worked fine initially, then when I disconnected the internet, I never observed .reopenWhen get called, but upon reconnecting the internet, the websocket connected again and the subscription started working again. However, running your demo on a device didn't behave the same as the simulator. I got the same behavior I've been seeing in my app. .reopenWhen got called exactly one time, and after reconnecting the internet, the websocket did not open again. What behavior do you see when you run your demo on a device?

Here's a snippet from the console logs. It looks like the websocket Task is finishing with an error immediately after the first time that .reopenWhen gets called.

helloSun Feb 25 05:59:53 GMT 2024
helloSun Feb 25 05:59:54 GMT 2024
helloSun Feb 25 05:59:55 GMT 2024
helloSun Feb 25 05:59:56 GMT 2024
helloSun Feb 25 05:59:57 GMT 2024
helloSun Feb 25 05:59:58 GMT 2024
Connection 1: received failure notification
Reopening...
Task <658C861C-C63B-4FD8-A451-18C3D38E089E>.<1> finished with error [-1,005] Error Domain=NSURLErrorDomain Code=-1005 "The network connection was lost." UserInfo={NSErrorFailingURLStringKey=https://leonidas-naiwjdzjsq-od.a.run.app/subscription, NSErrorFailingURLKey=https://leonidas-naiwjdzjsq-od.a.run.app/subscription, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalWebSocketTask <658C861C-C63B-4FD8-A451-18C3D38E089E>.<1>"
), _NSURLErrorFailingURLSessionTaskErrorKey=LocalWebSocketTask <658C861C-C63B-4FD8-A451-18C3D38E089E>.<1>, NSLocalizedDescription=The network connection was lost.}

@mboyd1993
Copy link
Contributor Author

Hi @martinbonnin, have you had a chance to try running your demo iOS project on a device? I'm still struggling with this issue in our iOS app and I don't know if this is a bug and I should file a bug report or if I'm just misunderstanding how to setup my WebSocketNetworkTransport so that it can recover from network outages.

@martinbonnin
Copy link
Contributor

Thanks for trying out on a real device. This sounds like a bug I'll dig into it more this week, apologies for the delay!

@mboyd1993
Copy link
Contributor Author

Thanks @martinbonnin. I was actually able to track down the bug and get it fixed! I'll file a bug report soon and describe all the details and the fix. I'm going to reference the demo project you created as a way to reproduce the bug.

@martinbonnin
Copy link
Contributor

Closing as experimental WebSocketNetworkTransport is shipped as part of 4.0.0-beta.6.
See #5862 for follow ups and https://www.apollographql.com/docs/kotlin/v4/advanced/experimental-websockets/#migration-guide for documentation.

Copy link
Contributor

Do you have any feedback for the maintainers? Please tell us by taking a one-minute survey. Your responses will help us understand Apollo Kotlin usage and allow us to serve you better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants