check for LockTryOnce before delay #21

maxoris · 2020-07-09T10:04:58Z

Solving issue #20

mfkl · 2020-07-16T17:31:45Z

Should the unit tests be adjusted/modified to reflect this change? https://github.com/G-Research/consuldotnet/blob/master/Consul.Test/LockTest.cs

jgiannuzzi

As per #20 (comment):

A fix would need to check whether LockTryOnce is false and whether the elapsed time is greater than LockWaitTime before waiting for LockRetryTime. If we don't do that, I think we might introduce a hot loop.

As it is right now, your PR introduce a hot loop when LockTryOnce is set to false, and it does not fix #20. Could you please fix the condition?

jgiannuzzi · 2020-08-18T11:41:17Z

Should the unit tests be adjusted/modified to reflect this change? https://github.com/G-Research/consuldotnet/blob/master/Consul.Test/LockTest.cs

Thanks @mfkl, that is a very good point!

I do wonder how to properly test this however, knowing that we can't rely on precise timings on the CI system.
Any ideas?

mfkl · 2021-02-11T03:54:56Z

I do wonder how to properly test this however, knowing that we can't rely on precise timings on the CI system.
Any ideas?

Maybe with some mocks? But it would require some more work. I'd say if the values used in tests make it pass 100% of the time, it should be fine (though not ideal). This PR needs a rebase before it is merged.

LGTM.

mfkl · 2022-02-04T08:46:09Z

[xUnit.net 00:03:59.70]     Consul.Test.LockTest.Lock_TryAcquireOnceWithLockDelayNonZeroWaitTime_EnsureRetryWait [FAIL]
Error: Consul.Test.LockTest.Lock_TryAcquireOnceWithLockDelayNonZeroWaitTime_EnsureRetryWait: Assert.True() Failure
  Failed Consul.Test.LockTest.Lock_TryAcquireOnceWithLockDelayNonZeroWaitTime_EnsureRetryWait [5 s]
  Error Message:
   Assert.True() Failure
Expected: True
Actual:   False
  Stack Trace:
     at Consul.Test.LockTest.Lock_TryAcquireOnceWithLockDelayNonZeroWaitTime_EnsureRetryWait() in /home/runner/work/consuldotnet/consuldotnet/Consul.Test/LockTest.cs:line 604

mfkl · 2022-02-25T09:06:27Z

That simple attempt worked on my fork's CI :/ https://github.com/mfkl/consuldotnet/actions/runs/1897597504

I can't repro this locally. Inclined to think the test is a bit too strict, timing-wise...

mfkl · 2022-03-17T09:23:26Z

On WSL Ubuntu 20.04:

Test Run Successful.
Total tests: 172
     Passed: 168
    Skipped: 4

Yet it looks like I can reproduce the test failure locally on WSL Ubuntu 18.4... only sometimes 🤔

mfkl · 2022-04-07T10:03:05Z

With some trouble, I can reproduce sporadically the test failure on ubuntu by letting this script run the test in a loop for a while

#!/bin/bash

err=0
while true
do
    dotnet test Consul.Test --configuration=Release --framework=net6.0 --logger:"console;verbosity=detailed" --filter "Lock_TryAcquireOnceWithLockDelayNonZeroWaitTime_EnsureRetryWait"
    (( (err=$?) > 0 )) && break
done
echo "$err"

After several minutes, it indeed fails the test. I added debug log statements in a few places, and it turns out that this line

consuldotnet/Consul/Lock.cs

Line 341 in aa46828

try { await Task.Delay(Opts.LockRetryTime, ct).ConfigureAwait(false); }

doesn't actually do what it's supposed to. With

Console.WriteLine(">>>> Waiting for " + Opts.LockRetryTime.TotalMilliseconds);
Console.WriteLine(">>>> current watch before wait " + sw.ElapsedMilliseconds);
try 
{
    await Task.Delay(Opts.LockRetryTime, ct).ConfigureAwait(false); 
    Console.WriteLine(">>>> current watch after wait " + sw.ElapsedMilliseconds);
}

When the test fails, this actually outputs

>>>> Waiting for 5000
>>>> current watch before wait 2
>>>> current watch after wait 4994

I believe we might be hitting dotnet/runtime#45585. Using a Thread.Sleep here does not sound good, even if only in Linux builds. We could make the test less strict about the timing through (add 10-15 milliseconds just in case), though that feels hackish as well.

marcin-krystianc · 2022-04-13T09:49:59Z

I could also reproduce this problem locally and it seems clear that dotnet/runtime#45585 is real (or maybe it is just an issue with documentation ?!).
My conclusions:

On Linux, measuring elapsed time with Environment.TickCount64 seems to work reliably, whereas Stopwatch can sometimes measure smaller elapsed time than the delay itself.
On Windows, there is no reliable technique to measure elapsed time as both Environment.TickCount64 and Stopwatch can sometimes measure smaller elapsed time than the delay itself.

Given that on Windows all existing methods of measuring elapsed time can give value smaller than the delay, I would argue that we should use some fraction of expected delay time in tests e.g.:

// https://github.com/dotnet/runtime/issues/45585
Assert.True(stopwatch.ElapsedMilliseconds > oneShotLockOptions.LockRetryTime.TotalMilliseconds * 0.9);

marcin-krystianc

Looks good.

I think it is a good opportunity to add some documentation around LockTryOnce (https://github.com/G-Research/consuldotnet/blob/master/Consul/Lock.cs#L610) option to avoid future confusions:

LockTryOnce = false - Acquire method will block forever until the lock is acquired. LockWaitTime is ignored in this case.
LockTryOnce = true - Acquire the lock within a timestamp (It is analogous to `SemaphoreSlim.Wait(Timespan timeout)`. 
                     Under the hood, it attempts to acquire the lock multiple times if needed (due to the HTTP Long Poll returning early), and will do so as many times as it can within the bounds set by LockWaitTime. 
                     If LockWaitTime is set to 0, there will be only single attempt to acquire the lock.

jgiannuzzi mentioned this pull request Aug 18, 2020

Unnecessary delay in LockTryOnce scenario #20

Closed

jgiannuzzi requested changes Aug 18, 2020

View reviewed changes

jgiannuzzi added this to the v1.6.9 milestone Oct 20, 2020

jgiannuzzi linked an issue Oct 20, 2020 that may be closed by this pull request

Unnecessary delay in LockTryOnce scenario #20

Closed

jgiannuzzi modified the milestones: v1.6.10, v1.6.10.2 Jun 25, 2021

mfkl force-pushed the hotfix/lock-try-once-delay branch from 85d4162 to 2d1a032 Compare April 15, 2022 05:10

mfkl requested review from jgiannuzzi and marcin-krystianc April 21, 2022 03:13

marcin-krystianc previously approved these changes Apr 22, 2022

View reviewed changes

jgiannuzzi previously approved these changes Apr 22, 2022

View reviewed changes

mfkl dismissed stale reviews from jgiannuzzi and marcin-krystianc via 0ed8f53 April 28, 2022 05:39

m.orischenko and others added 4 commits April 28, 2022 12:51

check for LockTryOnce before delay

194f0d2

retrywait condition fix + tests

e7b9110

relax test timing constraint due to dotnet/runtime#45585

7e52440

Document LockTryOnce impact on Lock.Acquire behavior

4c3e24c

mfkl force-pushed the hotfix/lock-try-once-delay branch from 0ed8f53 to 4c3e24c Compare April 28, 2022 05:51

jgiannuzzi approved these changes Apr 28, 2022

View reviewed changes

mfkl merged commit af93ebf into G-Research:master Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

check for LockTryOnce before delay #21

check for LockTryOnce before delay #21

Uh oh!

maxoris commented Jul 9, 2020

Uh oh!

mfkl commented Jul 16, 2020

Uh oh!

jgiannuzzi left a comment

Uh oh!

jgiannuzzi commented Aug 18, 2020

Uh oh!

mfkl commented Feb 11, 2021

Uh oh!

mfkl commented Feb 4, 2022

Uh oh!

mfkl commented Feb 25, 2022 •

edited

Loading

Uh oh!

mfkl commented Mar 17, 2022

Uh oh!

mfkl commented Apr 7, 2022 •

edited

Loading

Uh oh!

marcin-krystianc commented Apr 13, 2022

Uh oh!

marcin-krystianc left a comment

Uh oh!

check for LockTryOnce before delay #21

check for LockTryOnce before delay #21

Uh oh!

Conversation

maxoris commented Jul 9, 2020

Uh oh!

mfkl commented Jul 16, 2020

Uh oh!

jgiannuzzi left a comment

Choose a reason for hiding this comment

Uh oh!

jgiannuzzi commented Aug 18, 2020

Uh oh!

mfkl commented Feb 11, 2021

Uh oh!

mfkl commented Feb 4, 2022

Uh oh!

mfkl commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfkl commented Mar 17, 2022

Uh oh!

mfkl commented Apr 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcin-krystianc commented Apr 13, 2022

Uh oh!

marcin-krystianc left a comment

Choose a reason for hiding this comment

Uh oh!

mfkl commented Feb 25, 2022 •

edited

Loading

mfkl commented Apr 7, 2022 •

edited

Loading