New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Class C messages intermittently fail to schedule with invalid absolute time #3487
Comments
@virtualguy is this issue persisting? What is the data rate and in which region are you? When you say you want to transmit with 130 ms apart, are you taking the time-on-air into account? Can you subscribe to the gateway and end device events, via |
@virtualguy I added some extra debug lines. Pick any of:
Please grep output by If it says in cc @ymgupta |
fyi we are having some troubling running binaries we have built so blocked on #3736 before we can gather logs from your patch @johanstokking |
@virtualguy @johanstokking Here are some logs from running the patched version: https://gist.github.com/kurtmc/75f4ecf93c2f7a1ee8373d3a9c7f181a |
Thanks a lot. Apologies for the delayed response. This is going to be super helpful. For finding the root cause, I added a few more log entries. Can you run this again, with a new build, using https://github.com/TheThingsNetwork/lorawan-stack/tree/investigate/3487-abs-downlink-timing? If you want to cherry-pick on v3.10.x, cherry-pick 50f5605 and d1e5305 Also for the output, I need the full trace, from the beginning. Note that we're now printing the gateway ID (or EUI). If that is sensitive information, please redact that to another meaningful value or send the log via email. |
@kurtmc @virtualguy let me know how we can help this setting this up. If I need to send you a binary of Docker image, just let me know. |
Thanks @kurtmc. I don't see the "no absolute time" errors appearing here, only in the beginning but that is normal. So here, everything worked as expected, right? @adriansmares the race for which synchronization is fixed with #3794 is actually happening here. So that is real, see:
Note that these statements are out-of-order because of underlying flushing; it looks like short statements are flushed immediately and longer statements are delayed. I think that the issue is that when This is fixed with #3794. |
This comment has been minimized.
This comment has been minimized.
@virtualguy @kurtmc can you verify that this is resolved on latest Note the minor bump, you need to run DB migrations, see https://github.com/TheThingsNetwork/lorawan-stack/blob/e56f7f70e60dba8c1ad584411fb63a8c35659e7c/CHANGELOG.md#3110---2021-02-10 We'll be rolling a 3.11.1 release today. |
Closed by #3794 |
@virtualguy @kurtmc FYI we're backporting this to 3.10.10 so we can update our infrastructure sooner. Please keep an eye on #3800 and/or subscribe to release notifications here. |
@johanstokking Just to let you know, we have upgraded our production environment to 3.10.10 and we are still seeing the absolute time error in the logs. |
@kurtmc it is expected in the following case:
We use the latency between scheduling the downlink message and receiving the TX acknowledgment as the round-trip time. We need at least 5 of them to reliable take the median value. Then, when scheduling a class C downlink message with absolute time, Gateway Server uses the server time and the median round-trip time to calculate the absolute (server) time and the corresponding concentrator timestamp. If you see absolute time errors still, outside the cases above, please provide DEBUG level server logs. |
Definitely still seeing this issue in 3.10.10, looks like it happens on back to back transmissions (1000ms apart on the same device-id). I have sent through DEBUG logs via TTI support |
Summary
Class C messages intermittently fail to schedule with "invalid absolute time set in application downlink" when closely spaced.
Is there a minimum time between messages to the same device? Ideally we would have them scheduled 130ms apart but have tried to increase the spacing to 1000ms to mitigate the issue
Steps to Reproduce
We see this approximately every 15 dowlinks
What do you see now?
What do you want to see instead?
No downlink failures and messages emitted from the gateway
Environment
The Things Stack for LoRaWAN: ttn-lw-stack
Version: 3.9.4
Build date: 2020-09-23T09:56:19Z
Git commit: c4be55c
Go version: go1.15.2
OS/Arch: linux/amd64
How do you propose to implement this?
Determine the causes for the invalid absolute time and resolve if there is a bug
How do you propose to test this?
Happy to test a PR in our dev environment
Can you do this yourself and submit a Pull Request?
@rvolosatovs ?
The text was updated successfully, but these errors were encountered: