-
Notifications
You must be signed in to change notification settings - Fork 91
check for LockTryOnce before delay #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Should the unit tests be adjusted/modified to reflect this change? https://github.com/G-Research/consuldotnet/blob/master/Consul.Test/LockTest.cs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per #20 (comment):
A fix would need to check whether LockTryOnce is false and whether the elapsed time is greater than LockWaitTime before waiting for LockRetryTime. If we don't do that, I think we might introduce a hot loop.
As it is right now, your PR introduce a hot loop when LockTryOnce
is set to false
, and it does not fix #20. Could you please fix the condition?
Thanks @mfkl, that is a very good point! I do wonder how to properly test this however, knowing that we can't rely on precise timings on the CI system. |
Maybe with some mocks? But it would require some more work. I'd say if the values used in tests make it pass 100% of the time, it should be fine (though not ideal). This PR needs a rebase before it is merged. LGTM. |
|
That simple attempt worked on my fork's CI :/ https://github.com/mfkl/consuldotnet/actions/runs/1897597504 I can't repro this locally. Inclined to think the test is a bit too strict, timing-wise... |
On WSL Ubuntu 20.04:
Yet it looks like I can reproduce the test failure locally on WSL Ubuntu 18.4... only sometimes 🤔 |
With some trouble, I can reproduce sporadically the test failure on ubuntu by letting this script run the test in a loop for a while
After several minutes, it indeed fails the test. I added debug log statements in a few places, and it turns out that this line Line 341 in aa46828
doesn't actually do what it's supposed to. With Console.WriteLine(">>>> Waiting for " + Opts.LockRetryTime.TotalMilliseconds);
Console.WriteLine(">>>> current watch before wait " + sw.ElapsedMilliseconds);
try
{
await Task.Delay(Opts.LockRetryTime, ct).ConfigureAwait(false);
Console.WriteLine(">>>> current watch after wait " + sw.ElapsedMilliseconds);
} When the test fails, this actually outputs
I believe we might be hitting dotnet/runtime#45585. Using a |
I could also reproduce this problem locally and it seems clear that dotnet/runtime#45585 is real (or maybe it is just an issue with documentation ?!).
Given that on Windows all existing methods of measuring elapsed time can give value smaller than the delay, I would argue that we should use some fraction of expected delay time in tests e.g.:
|
85d4162
to
2d1a032
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I think it is a good opportunity to add some documentation around LockTryOnce (https://github.com/G-Research/consuldotnet/blob/master/Consul/Lock.cs#L610) option to avoid future confusions:
LockTryOnce = false - Acquire method will block forever until the lock is acquired. LockWaitTime is ignored in this case.
LockTryOnce = true - Acquire the lock within a timestamp (It is analogous to `SemaphoreSlim.Wait(Timespan timeout)`.
Under the hood, it attempts to acquire the lock multiple times if needed (due to the HTTP Long Poll returning early), and will do so as many times as it can within the bounds set by LockWaitTime.
If LockWaitTime is set to 0, there will be only single attempt to acquire the lock.
0ed8f53
to
4c3e24c
Compare
Solving issue #20