Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race on listener.connWG #260

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

stv0g
Copy link
Member

@stv0g stv0g commented Aug 27, 2023

Fixes #259

@stv0g stv0g requested a review from hasheddan August 27, 2023 13:04
@codecov
Copy link

codecov bot commented Aug 27, 2023

Codecov Report

Patch coverage: 85.18% and project coverage change: -0.11% ⚠️

Comparison is base (6890c79) 82.21% compared to head (e209826) 82.11%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #260      +/-   ##
==========================================
- Coverage   82.21%   82.11%   -0.11%     
==========================================
  Files          37       38       +1     
  Lines        3217     3243      +26     
==========================================
+ Hits         2645     2663      +18     
- Misses        456      461       +5     
- Partials      116      119       +3     
Flag Coverage Δ
go 81.86% <85.18%> (-0.20%) ⬇️
wasm 74.59% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
internal/sync/waitgroup.go 84.61% <84.61%> (ø)
udp/conn.go 83.03% <100.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Function waitGroup.Add cannot be called when another
goroutine calls waitGroup.Wait as is mentioned in
note: https://pkg.go.dev/sync#WaitGroup.Add

Co-authored-by: Jozef Kralik <jojo.lwin@gmail.com>
Copy link
Contributor

@hasheddan hasheddan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stv0g! Left a few minor comments below.

//
// A WaitGroup must not be copied after first use.
//
// In the terminology of the Go memory model, a call to Done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this was truncated when this package was introduced in pion/udp. Perhaps we could fix it to match https://cs.opensource.google/go/go/+/refs/tags/go1.21.0:src/sync/waitgroup.go;l=23?

// In the terminology of the Go memory model, a call to Done
// “synchronizes before” the return of any Wait call that it unblocks.

Comment on lines +17 to +19
// WaitGroups in the sync package do not allow adding or
// subtracting from the counter while another goroutine is
// waiting, while this one does.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at https://pkg.go.dev/sync#WaitGroup.Add and I don't believe this is strictly accurate (I know you are just migrating this package and didn't initially introduce it). Per https://pkg.go.dev/sync#WaitGroup.Add,

Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a negative delta, or calls with a positive delta that start when the counter is greater than zero, may happen at any time. Typically this means the calls to Add should execute before the statement creating the goroutine or other event to be waited for.

I believe the problem that was being encountered in the initial race condition was the rare case in which the wait group counter went to zero, but then Accept() was called on the listener, and its Add() raced with `Wait() in the goroutine that is started when listening beings.

I think the longer term solution is to consider some slight redesign in the UDP listener, but in the mean time, perhaps we could update this comment to mention that it guards against race conditions when adding to a wait group with counter 0 that is concurrently being waited on. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigation. I was also sceptical about this comment, but then didnt go into the details as I was not the original author.

Looking again at this, I dont believe it would be a good decission to merge this as is.
In my opinion the race should be fixed in the listener not by patching a commonly understood synchronization primitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stv0g sounds good to me -- are you planning to address in this PR or another?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hasheddan,

Sorry, I am currently too busy to look into this in detail.

Would you have the capacity to take this over?
Or should we go with the old fix instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @stv0g! I can take a look, thanks for the ping!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add back our custom implemetation of sync.WaitGroup
2 participants