Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT disconnects/reconnects due to missed KeepAlive interval to send ping #317

Open
juliandroid opened this issue May 17, 2019 · 6 comments

Comments

@juliandroid
Copy link

juliandroid commented May 17, 2019

The issue is related to #300
which aimed to "Use monotonic time for keep alive".

Unfortunately, the originally used time.Unix() rounds the time, so you always have at least 0.5 seconds to send the Ping and you always have checkInterval set to KeepAlive/2 and thus every second that check succeed and sends appropriate Ping message.

Currently, I get "random" MQTT disconnects/reconnects due to the missed Ping. Now the code sends one or zero Ping messages every KeepAlive interval due to the higher precision and lack of "rounding".

However, I don't believe that the original code was intended to rely on the rounding second to achieve sending of Ping message.
Thus, I've split the KeepAlive to 5 and Ping would be sent at the last 1/5 (4rd check) of the KeepAlive interval.

Pull request: #316

Could you please review it and approve it. I don't want to create another account for the Eclipse :)

@ovaltzer
Copy link

we have encountered a serious problem while regressing with the current version, specifically as you mentioned, random disconnections and problem with reconnect as well....for now we reverted the changes to the previous version...

@juliandroid
Copy link
Author

juliandroid commented May 19, 2019

The older version basically relies on rounding nature of the Unix() and leaves the PING with something like 0.5s to send the reply which might be not enough on heavy loaded system. The new code actually exposes the real problem behind.

@odedva You can also try https://github.com/eclipse/paho.mqtt.golang/pull/316/files

@odedva
Copy link
Contributor

odedva commented May 19, 2019

we actually were dealing for very long time with issues of connect\reconnect on bad networks scenarios with this client. mainly due to the nature of the publish channels and etc.
our next goal probably would be to get something running on top of the c client as we cannot find any better solution for sync connections to mqtt

@juliandroid
Copy link
Author

juliandroid commented May 29, 2019

I didn't try using in harsh network environment, but with current implementation this 0.5s could be the issue. I'm not sure I understand what problems due to publish channels you have? Is there a ticket here?

It is a bit strange that for this major issue no-one reacts for the last 10 days. For the near future I won't going to use mqtt library anymore, so someone else have to carry this fight :)))

@alsm
Copy link
Contributor

alsm commented Jul 3, 2019

As per the spec the server is supposed to allow 1.5 times the keepalive interval to receive a pingreq

If the Keep Alive value is non-zero and the Server does not receive a Control Packet from the Client within one and a half times the Keep Alive time period, it MUST disconnect the Network Connection to the Client as if the network had failed

I can see this would be a problem if the keepalive interval is short, I appreciate the work in the associated PR, but I cannot merge it without a signed ECA

@ashtonian
Copy link

ashtonian commented Jul 16, 2019

I've run into this when the client is under load and orderMatters is set to true. #210

Also found that when we removed ordering, under load the app would overflow routines. Not sure if this is still in place but we ended up forking and modifying as a fix. https://github.com/meshifyiot/paho.mqtt.golang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants