Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data races #4239

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Fix data races #4239

wants to merge 2 commits into from

Conversation

birneee
Copy link
Contributor

@birneee birneee commented Jan 8, 2024

Test that modify global variables should not run simultaneously with other tests.

Copy link

codecov bot commented Jan 8, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (1332752) 84.13% compared to head (7003515) 84.15%.
Report is 1 commits behind head on master.

❗ Current head 7003515 differs from pull request most recent head f368a53. Consider uploading reports for the commit f368a53 to get more accurate results

Files Patch % Lines
server.go 80.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4239      +/-   ##
==========================================
+ Coverage   84.13%   84.15%   +0.02%     
==========================================
  Files         150      149       -1     
  Lines       15404    15395       -9     
==========================================
- Hits        12960    12955       -5     
+ Misses       1945     1942       -3     
+ Partials      499      498       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marten-seemann
Copy link
Member

Is this why the test on CI is flaky? We're not running test in parallel, do we?

@birneee
Copy link
Contributor Author

birneee commented Jan 8, 2024

Yes, I think so.
Isn't that the default behavior of go test and ginkgo? But I am no sure. Let me check.

@birneee
Copy link
Contributor Author

birneee commented Jan 8, 2024

Still not sure if tests run in parallel with ginkgo -race -v -randomize-all -trace integrationtests/self.

However, I can reproduce the data race on my machine on master branch with seed 1704733707.

Command: go run github.com/onsi/ginkgo/v2/ginkgo -race --seed=1704733707 -v -randomize-all -trace integrationtests/self

Output:

...
Key Update tests downloads a large file
/home/benedikt/Git/quic-go/integrationtests/self/key_update_test.go:51
==================
WARNING: DATA RACE
Write at 0x000001136740 by goroutine 40429:
  github.com/quic-go/quic-go/integrationtests/self_test.glob..func15.1()
      /home/benedikt/Git/quic-go/integrationtests/self/key_update_test.go:54 +0xb3
  github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3()
      /home/benedikt/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.5/internal/node.go:463 +0x2e
  github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
      /home/benedikt/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.5/internal/suite.go:863 +0x14b

Previous read at 0x000001136740 by goroutine 40370:
  github.com/quic-go/quic-go/internal/handshake.(*updatableAEAD).shouldInitiateKeyUpdate()
      /home/benedikt/Git/quic-go/internal/handshake/updatable_aead.go:296 +0x204
  github.com/quic-go/quic-go/internal/handshake.(*updatableAEAD).KeyPhase()
      /home/benedikt/Git/quic-go/internal/handshake/updatable_aead.go:308 +0x2e
  github.com/quic-go/quic-go.(*packetPacker).MaybePackProbePacket()
      /home/benedikt/Git/quic-go/packet_packer.go:670 +0x591
  github.com/quic-go/quic-go.(*connection).sendProbePacket()
      /home/benedikt/Git/quic-go/connection.go:2020 +0x151
  github.com/quic-go/quic-go.(*connection).triggerSending()
      /home/benedikt/Git/quic-go/connection.go:1818 +0x384
  github.com/quic-go/quic-go.(*connection).triggerSending()
      /home/benedikt/Git/quic-go/connection.go:1825 +0x424
  github.com/quic-go/quic-go.(*connection).run()
      /home/benedikt/Git/quic-go/connection.go:632 +0x1396
  github.com/quic-go/quic-go.(*baseServer).handleInitialImpl.func2()
      /home/benedikt/Git/quic-go/server.go:683 +0x45

Goroutine 40429 (running) created at:
  github.com/onsi/ginkgo/v2/internal.(*Suite).runNode()
      /home/benedikt/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.5/internal/suite.go:850 +0x1464
  github.com/onsi/ginkgo/v2/internal.(*group).attemptSpec()
      /home/benedikt/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.5/internal/group.go:199 +0x1124
  github.com/onsi/ginkgo/v2/internal.(*group).run()
      /home/benedikt/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.5/internal/group.go:346 +0x1147
  github.com/onsi/ginkgo/v2/internal.(*Suite).runSpecs()
      /home/benedikt/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.5/internal/suite.go:463 +0x1211
  github.com/onsi/ginkgo/v2/internal.(*Suite).Run()
      /home/benedikt/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.5/internal/suite.go:116 +0x62e
  github.com/onsi/ginkgo/v2.RunSpecs()
      /home/benedikt/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.5/core_dsl.go:319 +0x1744
  github.com/quic-go/quic-go/integrationtests/self_test.TestSelf()
      /home/benedikt/Git/quic-go/integrationtests/self/self_suite_test.go:309 +0x4d
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1595 +0x261
  testing.(*T).Run.func1()
      /usr/lib/go/src/testing/testing.go:1648 +0x44

Goroutine 40370 (finished) created at:
  github.com/quic-go/quic-go.(*baseServer).handleInitialImpl()
      /home/benedikt/Git/quic-go/server.go:683 +0xfac
  github.com/quic-go/quic-go.(*baseServer).handlePacketImpl()
      /home/benedikt/Git/quic-go/server.go:447 +0x1527
  github.com/quic-go/quic-go.(*baseServer).run()
      /home/benedikt/Git/quic-go/server.go:290 +0x264
  github.com/quic-go/quic-go.newServer.func1()
      /home/benedikt/Git/quic-go/server.go:272 +0x33
==================
  Used 418 key phases on outgoing and 417 key phases on incoming packets.
...

@birneee
Copy link
Contributor Author

birneee commented Jan 10, 2024

Update: Serial does not fix the issue.
Those other goroutines are leftovers from another test.
I narrowed the problem down to the handshake_test.go.
The data race can happen when I run ginkgo --focus="Handshake tests rate limiting removes closed connections from the accept queue|Key Update tests downloads a large file" -race -v -randomize-all -trace integrationtests/self.
Also found another data race with ginkgo --focus="Handshake tests (using different cipher suites using TLS_CHACHA20_POLY1305_SHA256|Certificate validation accepts the certificate)" -race -v -randomize-all -trace integrationtests/self.

@birneee
Copy link
Contributor Author

birneee commented Jan 10, 2024

this commit 899031a fixes only the problem with ginkgo --focus="Handshake tests rate limiting removes closed connections from the accept queue|Key Update tests downloads a large file" -race -v -randomize-all -trace integrationtests/self

@birneee
Copy link
Contributor Author

birneee commented Jan 11, 2024

ginkgo --focus="Handshake tests (using different cipher suites using TLS_CHACHA20_POLY1305_SHA256|Certificate validation accepts the certificate)" -race -v -randomize-all -trace integrationtests/self

That data race is caused by SetCipherSuite, because multiple goroutines are accessing global variables.
In Handshake tests Certificate validation accepts the certificate the server connection reads the tls cipher suites.
Afterwards Handshake tests using different cipher suites using TLS_CHACHA20_POLY1305_SHA256 calls SetCipherSuite.
I think the problem is that the test goroutine never synchronizes with the server connection.run goroutine on exit.

The problem can be solved by explicitly calling destroy or close on the server connection instances.

@birneee birneee changed the title Fix flaky key update test Fix data races Jan 11, 2024
@birneee
Copy link
Contributor Author

birneee commented Jan 11, 2024

this is still wip

@marten-seemann
Copy link
Member

Thank you for working on this @birneee, these races have been annoying me for a long time. Really appreciate your debugging efforts here!

@birneee
Copy link
Contributor Author

birneee commented Jan 12, 2024

I cannot reproduce the error that happened in HTTP tests sets conn context

@marten-seemann
Copy link
Member

@birneee What's the status of this PR?

@birneee
Copy link
Contributor Author

birneee commented Jan 23, 2024

@marten-seemann it is still in progress.
The main problem is that some goroutines are still running or are not synchronized which leads to data races when accessing shared memory.
I think a first step should be to check after each test if some connection, baseServer or Transport goroutines are still running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants