You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A little bit of a context: I'm working on the SFU, currently, we have the following rough architecture:
We subscribe to all callbacks from the peer connection.
The callbacks are 'transformed' into a stream of events, i.e. they are just sent over the channel chan PeerConnectionEvent.
This means that each time we receive a callback, we send a message over the aforementioned unbounded channel.
The handler of the messages listens for them and executes certain modifications of the conference state including modifying the peer connection.
This approach worked well when we used buffered channels with quite a large buffer, but occasionally the conference appeared to be frozen. I replaced the buffered channel with an unbounded channel (i.e. the sender blocks until the receiver reads the message). Right after that, I observed that quite often the conference gets frozen by indefinitely waiting inside a peerConnection.Close() method.
After some investigations, I've noticed that it primarily happens when shortly before calling the Close() function, we receive a callback for OnICECandidate. Close() hangs forever and never returns.
It seems like there is some race or a deadlock in Pion that acquires certain resources when calling the OnICECandidate callback which is then expected to be released by the moment the callback finishes. I.e. essentially the workaround that we found is to never block or wait for any I/O inside the OnICECandidate callback, but we're a bit concerned that there might be more places where such a thing can occur. It seems like sometimes it hangs for no apparent reason.
What did you expect?
I expected that Close() does not hang forever. Ideally, the behavior for the callbacks should be unified and predictable unless there is a reason to have different requirements for the callbacks that we pass to Pion (I think it probably would be a good idea to document them), i.e. if the callbacks are not allowed to block of it it's not allowed to access the peer connection from the callback closure, it probably needs to be documented. This happens not every time which gives a hint that there must be a race somewhere inside Pion that leads to a deadlock.
What happened?
peerConnection.Close() hung forever.
The text was updated successfully, but these errors were encountered:
I'm not sure whether this is a bug. The callbacks are called synchronously, if they need to block, then you should make them asynchronous yourself. This is the usual idiom in Go, where making a synchronous function asynchronous is easy (just add the go keyword), but making an asynchronous invocation synchronous is difficult (requires either a channel or a workgroup).
What did you do?
A little bit of a context: I'm working on the SFU, currently, we have the following rough architecture:
chan PeerConnectionEvent
.This approach worked well when we used buffered channels with quite a large buffer, but occasionally the conference appeared to be frozen. I replaced the buffered channel with an unbounded channel (i.e. the sender blocks until the receiver reads the message). Right after that, I observed that quite often the conference gets frozen by indefinitely waiting inside a
peerConnection.Close()
method.After some investigations, I've noticed that it primarily happens when shortly before calling the
Close()
function, we receive a callback forOnICECandidate
.Close()
hangs forever and never returns.Here is the callback:
It hangs here:
https://github.com/pion/ice/blob/4dea7246b63bd6230dcb40b14c77679a10844e62/agent.go#L927
It seems like there is some race or a deadlock in Pion that acquires certain resources when calling the
OnICECandidate
callback which is then expected to be released by the moment the callback finishes. I.e. essentially the workaround that we found is to never block or wait for any I/O inside theOnICECandidate
callback, but we're a bit concerned that there might be more places where such a thing can occur. It seems like sometimes it hangs for no apparent reason.What did you expect?
I expected that
Close()
does not hang forever. Ideally, the behavior for the callbacks should be unified and predictable unless there is a reason to have different requirements for the callbacks that we pass to Pion (I think it probably would be a good idea to document them), i.e. if the callbacks are not allowed to block of it it's not allowed to access the peer connection from the callback closure, it probably needs to be documented. This happens not every time which gives a hint that there must be a race somewhere inside Pion that leads to a deadlock.What happened?
peerConnection.Close()
hung forever.The text was updated successfully, but these errors were encountered: