Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webrtc: potential problem with the Pion state machine #2614

Open
Tracked by #2656 ...
marten-seemann opened this issue Oct 20, 2023 · 6 comments · May be fixed by #2732
Open
Tracked by #2656 ...

webrtc: potential problem with the Pion state machine #2614

marten-seemann opened this issue Oct 20, 2023 · 6 comments · May be fixed by #2732
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@marten-seemann
Copy link
Contributor

We might be using the Pion state machine incorrectly:

=== RUN   TestTransportWebRTC_Deadline/SetWriteDeadline
      transport_test.go:401: 
          	Error Trace:	/home/runner/work/go-libp2p/go-libp2p/p2p/transport/webrtc/transport_test.go:401
          	Error:      	Received unexpected error:
          	            	failed to secure outbound connection: error reading handshake message: sending reset packet in non-established state: state=Closed
          	Test:       	TestTransportWebRTC_Deadline/SetWriteDeadline
  --- FAIL: TestTransportWebRTC_Deadline (20.14s)
      --- PASS: TestTransportWebRTC_Deadline/SetReadDeadline (0.13s)
      --- FAIL: TestTransportWebRTC_Deadline/SetWriteDeadline (20.00s)
@marten-seemann marten-seemann added the kind/bug A bug in existing code (including security flaws) label Oct 20, 2023
@sukunrt
Copy link
Member

sukunrt commented Oct 22, 2023

There's one other bug which causes failure in connection establishment:

2023-10-22T06:47:19.9957753Z === RUN   TestTransportWebRTC_DialerCanCreateStreamsMultiple
2023-10-22T06:47:19.9959034Z panic: test timed out after 10m0s
2023-10-22T06:47:19.9959911Z running tests:
2023-10-22T06:47:19.9960992Z 	TestTransportWebRTC_DialerCanCreateStreamsMultiple (9m58s)

https://github.com/libp2p/go-libp2p/actions/runs/6602211220/job/17934010538

@sukunrt sukunrt mentioned this issue Feb 21, 2024
22 tasks
@sukunrt
Copy link
Member

sukunrt commented Mar 5, 2024

Both these errors haven't happened since merging #2717

The first errors was fixed by fixing the tests. When one side closes the connection the other side sees this line when it does a stream Close.

sending reset packet in non-established state: state=Closed

@MarcoPolo
Copy link
Contributor

Both these errors haven't happened since merging #2717

The first errors was fixed by fixing the tests. When one side closes the connection the other side sees this line when it does a stream Close.

sending reset packet in non-established state: state=Closed

How was the other error fixed?

@sukunrt
Copy link
Member

sukunrt commented Mar 6, 2024

I am not convinced it is fixed completely. We just find it flaking less for some reason.

One of the reasons why that error occurs is this https://github.com/libp2p/go-libp2p/actions/runs/8125238982/job/22214622459?pr=2722

This will be fixed when pion/sctp does a new release. This was fixed by: pion/sctp#301

The other reason why it was flaking was an error in our udpmux code. That was fixed by: #2586

Again, not convinced it is fixed completely by these two. There might be some race condition in the conn establishment step that we are missing. One of the reasons for getting #2717 was that in the logs after failure there were so many go routines that this was impossible to debug. Hopefully now we can better identify the cause.

@sukunrt
Copy link
Member

sukunrt commented Mar 13, 2024

For the failures after 10m we see. I have explained the reasoning in the PR. My current theory is. The problem is with connection state change notification being reordered. Pion just runs them in a separate goroutine without waiting for it to finish. So here the listener thinks that connection isn't established, while it actually has established.

I've opened a PR in pion/webrtc that'd close this: https://github.com/pion/webrtc/pull/2702/files

@sukunrt
Copy link
Member

sukunrt commented Mar 13, 2024

The same error in updating ice connection states: https://github.com/pion/ice/pull/656/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants