Skip to content

Commit

Permalink
fix(DeadlockTest): Handle draining of closed channel, speed up test. (#…
Browse files Browse the repository at this point in the history
…1957)

Fixes DGRAPHCORE-153

## Problem
We have had frequent problems with the TestPublisherDeadlockTest. Every
once in a while, we would witness timeouts on the test itself. I
witnessed that in at least one case, the problem could be attributed to
a channel being "drained" but having incorrect handling in the case
where the channel is closed. While this was not a majority root-cause
for this issue, it is nevertheless something we should correctly handle.
Additionally, we find that these tests can take quite a while -- most of
the time simply waiting for go-routines to finish their time.Sleep(...)
invocations. The use of time.Sleep to handle timing issues is typically
a sign that the messaging has not been fully thought through.

## Solution
For the issue with the "draining" not handling closed channels, the
process is simple: we just add a check for the channel already being
closed during the drain process. For the issue of the overuse of
time.Sleep, and long-running invocations, I replaced the time.Sleep
invocations with WaitGroup instances. After these changes, I am unable
to prove that the full issue has been resolved, but I find that I have
been unable to reproduce the original failures after these changes.
Invoking `go test -race -v --count=1000 -run TestPublisherDeadlock` now
completes in roughly 80 seconds (for the 1000 tests) with each test
taking roughly 0.08 seconds as opposed to roughly 20 seconds (and
multiple hours required to run 1000 tests). Moreover, on the 10 or so
trials I have made to reproduced the original problem after this change,
I get 100% test success.
  • Loading branch information
billprovince committed May 30, 2023
1 parent 44b5978 commit f9b9e4d
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 5 deletions.
6 changes: 5 additions & 1 deletion db.go
Original file line number Diff line number Diff line change
Expand Up @@ -1879,7 +1879,11 @@ func (db *DB) Subscribe(ctx context.Context, cb func(kv *KVList) error, matches
drain := func() {
for {
select {
case <-s.sendCh:
case _, ok := <-s.sendCh:
if !ok {
// Channel is closed.
return
}
default:
return
}
Expand Down
18 changes: 14 additions & 4 deletions publisher_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ package badger
import (
"context"
"fmt"
"runtime"
"sync"
"sync/atomic"
"testing"
"time"

"github.com/pkg/errors"
"github.com/stretchr/testify/require"
Expand All @@ -40,14 +40,18 @@ func TestPublisherDeadlock(t *testing.T) {
var firstUpdate sync.WaitGroup
firstUpdate.Add(1)

var allUpdatesDone sync.WaitGroup
allUpdatesDone.Add(1)
var subDone sync.WaitGroup
subDone.Add(1)
go func() {
subWg.Done()
match := pb.Match{Prefix: []byte("ke"), IgnoreBytes: ""}
err := db.Subscribe(context.Background(), func(kvs *pb.KVList) error {
firstUpdate.Done()
time.Sleep(time.Second * 20)
// Before exiting Subscribe process, we will wait until each of the
// 1110 updates (defined below) have been completed.
allUpdatesDone.Wait()
return errors.New("error returned")
}, []pb.Match{match})
require.Error(t, err, errors.New("error returned"))
Expand All @@ -65,7 +69,6 @@ func TestPublisherDeadlock(t *testing.T) {
firstUpdate.Wait()
var req atomic.Int64
for i := 1; i < 1110; i++ {
time.Sleep(time.Millisecond * 10)
go func(i int) {
err := db.Update(func(txn *Txn) error {
e := NewEntry([]byte(fmt.Sprintf("key%d", i)), []byte(fmt.Sprintf("value%d", i)))
Expand All @@ -79,8 +82,15 @@ func TestPublisherDeadlock(t *testing.T) {
if req.Load() == 1109 {
break
}
time.Sleep(time.Second)
// FYI: This does the same as "thread.yield()" from other languages.
// In other words, it tells the go-routine scheduler to switch
// to another go-routine. This is strongly preferred over
// time.Sleep(...).
runtime.Gosched()
}
// Free up the subscriber, which is waiting for updates to finish.
allUpdatesDone.Done()
// Exit when the subscription process has been exited.
subDone.Wait()
})
}
Expand Down

0 comments on commit f9b9e4d

Please sign in to comment.