Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cordova iOS App don't reconnect to channels after App has been in background #456

Open
danicastanos opened this issue Nov 18, 2016 · 12 comments

Comments

@danicastanos
Copy link

I've an App developed with Cordova and I use the Faye JS client (v.1.2.2) in it. The thing is when the App goes into background, I detect the 'pause' event fired from the device and unsubscribe from the channels I was subscribed to. Then, on App 'resume' or 'active' I try to reconnect again.

The thing is, depending on how much time the App has been in background (sometimes is just random) I can't subscribe to channels again because the transport is down and, the transport:up doesn't arrive either.

It has been tested in iOS 10.1 with Faye 1.2.2.

Thanks

@danicastanos
Copy link
Author

The issue looks like to happen often if the iPhone is locked

@jcoglan
Copy link
Collaborator

jcoglan commented Dec 27, 2016

Can I ask why you're unsubscribing from all the channels rather than letting the client handle reconnection for you as it's supposed to?

If you're unsubscribing from everything, you may as well disconnect() the client entirely and make a new one when the app is foregounded again.

(Caveat that I have no experience with Cordova so I'm going to have limited ability to answer detailed questions.)

@jcoglan
Copy link
Collaborator

jcoglan commented Jan 28, 2017

@danicastanos Do you have any feedback on my previous comment?

@danicastanos
Copy link
Author

So sorry @jcoglan... The email notification of your comment was lost in my inbox!!
I unsubscribe from the channels and subscribing again thinking it was a nice way to ensure a fresh connection and avoid processing notifications with no need to process when the app is in background.

Let me try first what you suggest to not unsubscribe from the channels to see how the App behaves... Then I will try also to disconnect client and connect again if the first approach is not working properly.

Thanks!

@mattiaspalmgren
Copy link

mattiaspalmgren commented Jun 26, 2017

We’re experiencing similar problems with a web app, where the client goes down (transport:down) when locking the phone and do not manage to go up (transport:up) when opening it again - and seems to go missing on message sent during the lock-period. Is it expected to go up again when opening the web page again? Or am I misunderstanding something? This happens on iOs 10.3.2 with Faye 1.2.4.

@jkarneges
Copy link
Contributor

In our tests, Faye 1.1.2 reconnects after opening the app again, but transport:up won't emit until about a minute passes or a message is received. This can make it seem like Faye hasn't reconnected yet when in fact it has and can receive messages.

The transport:up event is needed for apps that need to react to the event to recover data though, and the long delay to receive this event can make for a subpar user experience. We should look at fixing Faye to emit transport:up immediately after the connection is reestablished.

@JohanBengtsson
Copy link

@jkarneges We experience this issue as well so it would be great if this issue can be fixed, but we will for now listen in the window focus event for recovering data from messages that were "lost" while the client wasn't connected to the WebSocket channel.

@jcoglan
Copy link
Collaborator

jcoglan commented Jul 5, 2017

@mattiaspalmgren @jkarneges @JohanBengtsson Can you tell me more about what you're using transport:up for? It was originally intended to be purely advisory, and not to be used for any logic effects like trying to reconnect, recover data, etc. The Faye client should recover automatically and transparently without intervention from you.

Can you tell me why you're using it and what data recovery means in this situation?

Do you know why transport:up does not fire when you expect? The WebSocket transport only makes a connection when there are messages to send, and sending them should elicit a response from the server, so transport:up should coincide with these responses and therefore happen shortly after the WebSocket connection is made. Can you explain the circumstances in which this doesn't happen?

@jcoglan
Copy link
Collaborator

jcoglan commented Jul 5, 2017

@mattiaspalmgren

when locking the phone and do not manage to go up (transport:up) when opening it again - and seems to go missing on message sent during the lock-period. Is it expected to go up again when opening the web page again? Or am I misunderstanding something?

Let me see if I can clarify. If for any reason the client stops communicating with the server, e.g. you are offline or the runtime hosting the client pauses execution, the server will eventually time out the client's session. Until that time-out happens, messages are buffered on the server side and will be delivered when the client next makes contact.

But if the client goes past this time-out window, then those messages that were buffered on the server are dropped. The client will ask the server for a new session, and it will re-register the channels it's subscribed to, but any messages that were routed to the old session will be lost. In this event, you will need to recover data.

However, transport:up is not intended for that. I don't believe you should be using Faye itself to determine when you need to recover lost events -- this should be something that's baked into your application's data protocol design. There are even better ways within Faye for dealing with it, such as sending a snapshot of current state along with /meta/subscribe responses from the server, so that the client is given an up-to-date picture of the world the moment it begins receive messages, either when it first connects or when it recovers from disconnection.

Does that make sense?

@jkarneges
Copy link
Contributor

Can you tell me why you're using it and what data recovery means in this situation?

In this case, data recovery means querying against the application's main backend service/DB for historical state (as opposed to trying to recover data through the Bayeux protocol). If the client has awareness that a pubsub connection was lost and then restored, then that's a good moment to try recovering data.

Do you know why transport:up does not fire when you expect?

Looking at the code, it appears the event is sent after a response is received from the server. What I suspect is happening is the Faye client sends /meta/connect and the server delays the response as it normally does, and so the event isn't emitted until after that delay.

Probably the fix is either for the server (in our case, Fanout Cloud) to respond right away to the first connect request over a WebSocket, or for the Faye client to hint to the server when it wants an immediate response. What do you think is best?

@jcoglan
Copy link
Collaborator

jcoglan commented Aug 5, 2017

Apologies, July was a hectic month for me. To respond to @jkarneges's points:

Can you tell me why you're using it and what data recovery means in this situation?

In this case, data recovery means querying against the application's main backend service/DB for historical state (as opposed to trying to recover data through the Bayeux protocol). If the client has awareness that a pubsub connection was lost and then restored, then that's a good moment to try recovering data.

There is a better way of doing this, which I'm realising I should probably add to the docs because it's come up so many times. I'll explain that shortly but I just want to note the transport events are not suitable for being used like this.

First, a transport event does not indicate the client's session expired. It may have only been offline briefly, so messages were buffered on the server and nothing was lost.

Second, the transport events are not reliable: they're a best-effort guess as to whether client is "online" or not, based on the connection or otherwise of a socket or whether it's receiving HTTP responses. The methods of error detection are different between transports and some are very unreliable. That's why these events are intended to be purely advisory and not used for logic.

Third, recovering data by querying the server for it, without synchronising in some way with the data you're receiving via Faye, will result in race conditions where messages you receive from Faye will either duplicate or miss elements of the state you got from the server.

Looking at the code, it appears the event is sent after a response is received from the server. What I suspect is happening is the Faye client sends /meta/connect and the server delays the response as it normally does, and so the event isn't emitted until after that delay.

Ah yes, I should have spotted that, that makes sense.

Probably the fix is either for the server (in our case, Fanout Cloud) to respond right away to the first connect request over a WebSocket, or for the Faye client to hint to the server when it wants an immediate response. What do you think is best?

This might work for WebSocket, but it won't work for stateless transports like long-polling. Those still won't see transport:up until the next /meta/connect returns. The Faye server already does this for /meta/connect received using the eventsource transport since the XHR request used for that should not long-poll as messages are delivered via the one-way EventSource connection.

The Faye client could add advice: {timeout: 0} to outgoing /meta/connect messages if it believed the message was the first one since going offline, but doing this accurately would require more information than the client has available, for example it would need to know whether its session on the server has timed out, which it cannot know until the message response arrives. It would also require the transport-level error detection to be much more accurate than we're able to make it. For example, the client is unable to distinguish the situations of the server holding a long-polling request open, and the TCP connection hanging indefinitely due to a firewall.

In general telling if we're "offline" requires being able to distinguish all sorts of errors that we can't do in the browser: the user might have no local network connection, their router might not have internet access, there might be a DNS error, there might be a routing error while reaching the host, the host might refuse the connection, the host might accept the connection but have no backend servers up, the backend servers might be up but not support WebSocket, or be misconfigured, etc... Determining the "connection state" of the client in the face of all these, not being able to tell many of them apart, not being able to tell if the error is transient or not, makes doing this really hard. In the end, I reach for the end-to-end principle and say you're "online" if you are receiving messages, which is what the transport events try to reflect. I think having the client try to anticipate whether it wants to long-poll or not will probably result in more errors than we currently experience, and make the client more chatty.

Here's what I think you should do: avoid coupling the correctness of your application to low-level unreliable transport concerns, and build some way of detecting missed data into your application design. There is one pattern you can use within Faye to help you if you have already done this, that makes catching up on state you've missed easier.

When you first subscribe to a channel, and when the client recovers from its session timing it, it sends a /meta/subscribe for every channel you subscribe to. These are the only situations in which you will have missed messages; the client only re-sends /meta/subscribe when its session has timed out and therefore its server-side message buffer was discarded.

In this event, the server can attach some representation of the current state of the resource corresponding to that channel, e.g.:

server.addExtension({
  outgoing: function(message, callback) {
    if (message.channel === '/meta/subscribe' && message.successful) {
      message.ext = message.ext || {};
      message.ext.currentState = getCurrentStateFor(message.subscription);
    }
    callback(message);
  }
});

Then on the client, you can collect that added data:

client.addExtension({
  incoming: function(message, callback) {
    if (message.channel === '/meta/subscribe' && message.successful) {
      handleNewState(message.subscription, message.ext.currentState);
    }
    callback(message);
  }
});

This design means that you are always sent an update on the state of the resource at the instant the client becomes subscribe to updates for it, minimising the chance for race conditions to present themselves between you becoming subscribed, and asking the server for the current state.

Would that approach work in your application?

@jkarneges
Copy link
Contributor

Thanks for replying @jcoglan.

a transport event does not indicate the client's session expired. It may have only been offline briefly, so messages were buffered on the server and nothing was lost.

Unfortunately this is not quite true with the Bayeux protocol. If the server sends a message to a dead client connection then the message will be lost. This means it's possible for a client to get disconnected and reconnect with session intact, yet lose messages in between.

Note that I don't really consider this a major problem, as publish-subscribe systems are unreliable by design anyway. If a message doesn't get lost at this layer, it'll get lost somewhere else, and good receiving apps will build ways to recover.

the transport events are not reliable: they're a best-effort guess as to whether client is "online" [...] That's why these events are intended to be purely advisory and not used for logic.

I understand that the transport events are unreliable in the sense that Faye might think a connection is alive when it isn't, because it can take the network stack awhile to detect that a connection is gone. However, a transport:up event should always get emitted upon successful reconnect, and that's good enough for the purposes of this discussion.

Typically, a publish-subscribe client should fetch state after a known/potential loss. Using transport:up as a hint to quickly fetch state after a reconnect ("potential loss") seems reasonable to me.

The Faye client could add advice: {timeout: 0} to outgoing /meta/connect messages if it believed the message was the first one since going offline, but doing this accurately would require more information than the client has available, for example it would need to know whether its session on the server has timed out, which it cannot know until the message response arrives.

Why would it need to know that? I think all it would need to know is whether it has considered the transport to be down yet (i.e. it emitted transport:down earlier).

the server can attach some representation of the current state of the resource corresponding to that channel

I'm fond of this approach when it's possible (we actually support something like this with our non-Bayeux services), but I don't think it works accurately with Bayeux due to the fact that the protocol can lose messages mid-session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants