You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Specifically the mongodb_mongod_replset_member_replication_lag metric, which is only exposed if compatible mode is enabled with v2.
The wrong lag value is basically due to some part of the lag calculation being zero, resulting in the value being the current unix timestamp (the absolute value of the result is used, so either side of the substraction could be zero).
When the lag is being reported incorrectly, the state label is set to (not reachable/healthy) - so overall this is easy to mitigate. Equally it's possible to just not use this metric considering it's only available via compatible mode - the calculations can be done via mongodb_rs_members_optimeDate instead. But either way, it does seem like a potential bug that the lag is falsely reported as high as it is.
Look for "Invalid lag observed via ..." in the output. You may need to run the script a few times before the condition hits.
Expected behavior
I'm wondering, if lag cannot be calculated due to data availiblity issues, should the metric be omitted perhaps? I'm slightly cautious about the fact people may leverage the state label to keep an eye on health, though I suspect there are better metrics for that. If there is a desire to still expose the metric even when lag cannot be calculated, then I'm unsure - it could just be left as it is.
Describe the bug
Specifically the
mongodb_mongod_replset_member_replication_lag
metric, which is only exposed if compatible mode is enabled with v2.The wrong lag value is basically due to some part of the lag calculation being zero, resulting in the value being the current unix timestamp (the absolute value of the result is used, so either side of the substraction could be zero).
When the lag is being reported incorrectly, the
state
label is set to(not reachable/healthy)
- so overall this is easy to mitigate. Equally it's possible to just not use this metric considering it's only available via compatible mode - the calculations can be done viamongodb_rs_members_optimeDate
instead. But either way, it does seem like a potential bug that the lag is falsely reported as high as it is.To Reproduce
docker-compose.yaml
run.sh
Look for "Invalid lag observed via ..." in the output. You may need to run the script a few times before the condition hits.
Expected behavior
I'm wondering, if lag cannot be calculated due to data availiblity issues, should the metric be omitted perhaps? I'm slightly cautious about the fact people may leverage the
state
label to keep an eye on health, though I suspect there are better metrics for that. If there is a desire to still expose the metric even when lag cannot be calculated, then I'm unsure - it could just be left as it is.Logs
Output from running the above script:
1.69176624e+09 = 1691766240 = 2023-08-11T15:04:00Z
Environment
The text was updated successfully, but these errors were encountered: