Skip to content
kixelated edited this page Feb 19, 2019 · 17 revisions

Issues with Current API

Concurrency/Race issues -- Golang provides an excellent API for sharing data between threads (channels) and we aren't taking advantage of them at all. There is a big mental burden (and chance of error) by locking/unlocking in the networkManager.

Maintainability -- people find the current design confusing. Would prefer more clear boundaries between components. This would also look like the ECMAScript WebRTC API. However, they have zero relation. This is internal to pion-WebRTC and has no effect on a user.

Callback heavy -- Everytime we want to add new features we add a new callback, and it is making things cumbersome

Current API

NetworkManager

Manages all network traffic. To send a packet call a method on the networkManager.

Responding to an inbound packet is done via callbacks. When you create a networkManager we have multiple distinct callbacks for events (RTP, SCTP, ICE etc..)

ICE

Handles ICE status. NetworkManager pushes packets on it, and uses callbacks when it wants to send traffic.

DTLS

Handles encryption for SCTP packets, and DTLS handshake. NetworkManager pushes packets on it, and uses callbacks when it wants to send traffic.

SCTP

Protocol used by DataChannels. NetworkManager pushes packets on it, and uses callbacks when it wants to send traffic.

SRTP

Used for Encrypting/Decrypting audio/video data. This is used by the NetworkManager directly, it isn't async so provides simple Encrypt/Decrypt blocking APIs

Proposed API: Backkem

This proposal is inspired by work done in libp2p/go-libp2p-transport, perlin-network/noise and xtaci/kcp-go. In addition, it mirrors some basic interfaces in the io and net package. Note: Some details may be missing but the overall approach seems feasible.

// Transport represents a layer-able transport
type Transport interface {
  Dial(Conn) (Conn, error)
  Listen(Conn) (net.Listener, error)
}

// Conn is a minimal clone of the net.Conn interface to reduce implementation overhead. Could be completed down the line.
type Conn interface {
  // Read reads data from the connection.
  Read(b []byte) (n int, err error)

  // Write writes data to the connection.
  Write(b []byte) (n int, err error)

  // LocalAddr returns the local network address.
  LocalAddr() net.Addr

  // RemoteAddr returns the remote network address.
  RemoteAddr() net.Addr

  // Close shuts down the transport and the underlying transports
  // Any blocked Accept operations will be unblocked and return errors.
  Close()
}

The net.Listener interface for reference:

type Listener interface {
    // Accept waits for and returns the next connection to the listener.
    Accept() (Conn, error)

    // Close closes the listener.
    // Any blocked Accept operations will be unblocked and return errors.
    Close() error

    // Addr returns the listener's network address.
    Addr() Addr
}

With the following instances:

  • internal/Transport.ICE
    • Dial & Listen return once the first candidate valid pair binding succeeds. At that point upper lagers may start talking.
    • Read/Write calls Read/Write on the current best valid candidate pair.
  • internal/Transport.DTLS
    • Dial & Listen return once the handshake succeeds. At that point upper lagers may start talking.
    • Read/Write encrypts the data and calls Read/Write on the underlying conn.
  • internal/Transport.SRTP
    • Read/Write calls Read/Write on the underlying conn.
  • internal/Transport.SCTP
    • Dial & Listen return once SCTP Init succeeds. At that point upper lagers may start talking.
    • Read/Write do the needed queuing and calls Read/Write on the underlying conn.
  • internal/Transport.DataChannel: In charge data channel connections (Seems best to split Binary/Text in the WebRTC layer)
    • Dial & Listen return once the data channel is open. At that point upper layers may start talking.
    • Read/Write wraps the data and calls Read/Write on the underlying conn.

Each instance has a specialized constructor for passing options. In addition, they expose their own specific methods for introspection by the upper WebRTC layer. Some of the protocol RFCs actually contain recommendations for those. Any values used by these methods will be translated into their RTP* alternatives by the WebRTC layer this entails some code duplication but the internal state will be safe.

It seems advisable to create a TransportBuilder that can dynamically build the correct transport based on the SDP info. This is also where protocol re-starts would be handled. And we could do things like avoid spinning up SCTP when no data channels are used.

Example usage

The following show an example of API usage. The examples hides some edge cases and error handling for clarity.

// # Initialization

iceTransport := NewICETransport(specificOptions...)
dtlsTransport := NewDTLSTransport(specificOptions...)

srtpTransport := NewSRTPTransport(specificOptions...)

sctpTransport := NewSCTPTransport(specificOptions...)
dcTransport := NewDCTransport(specificOptions...)

// # Dial

// Note: This example will probably need some muxers for (de-)multiplexing the different protocols, E.g.: NewABMux(a Conn, b Conn) Conn
//       I see this as a positive since now this logic is kinda hidden in port-send and port-receive

iceConn, _ := iceTransport.Dial(nil)
dtlsConn, _ := dtlsTransport.Dial(iceConn)

srtpConn, _ := srtpTransport.Dial(dtlsConn)

sctpConn, _ := sctpTransport.Dial(dtlsConn)
dcConn, _ := dcTransport.Dial(sctpConn)

// Listen
iceListener, _ := iceTransport.Listen(nil)
iceConn, _ := iceListener.Accept()

dtlsListener, _ := dtlsTransport.Listen(iceConn)
dtlsConn, _ := dtlsListener.Accept()

// etc...

// # Sending data

// Option 1: Marshal and call write on the conn:
var p *rtp.Packet
b, _ := p.Marshal()
_, _ := srtpConn.Write(b)


// Option 2: Pass the io.Writer:
var p *rtp.Packet
p.WriteTo(srtpConn)

// # Writing data

// Option 1: Read on the conn and unmarshal:
b := make([]byte, 1024)
_, _ = srtpConn.Read(b)
var p *rtp.Packet
_ = p.Unmarshal(b, p)

// Option 2: Pass the io.Reader:
var p *rtp.Packet
p.ReadFrom(srtpConn)

Notes

  • The API is familiar to anyone that has worked with a io.ReadWriteCloser or the net package.
  • The API is blocking, reducing the need for 'OnConnected' or 'OnClose' events. Instead, the caller just waits for Dial or Accept to finish.
  • For SRTP we could provide a io.ReadWriter all the way into the Track object. Intensive DataChannels apps may want a similar option, can be investigated in the future.
  • Hardcore separation of concerns (Hopefully not to much).
  • If we want to add buffering we can create an instance of Conn around a bytes.Buffer and inject them into the transport stack.
  • There is an argument for making STUN/TURN a transport as well. I kept it out of the above discussion for clarity, but:
    • The STUN transport can keep its binding alive (becomes even more important in the TCP case).
    • The TURN transport can keep its association/channel alive.
    • Making them separate transports could keep this additional complexity out of the ICE package.
  • We could take a shortcut and put Accept directly on Transport but this seems like a minor sacrifice for sticking to the standard interfaces.
  • I'm not suggesting we do this now but if/when we want to expose protocols in their own package at some point in the future we only need to add basic Dial(address string) and Accept(address string) methods. (I do expect there will be a high demand for DTLS.)

Feedback

Sean Backkem's API is 100% the way we should go. Reading through it now all seems so obvious! Thanks for taking the time to explain/formalize this!

Async Events between Transports

Sean The tight coupling of Transports now allows them to talk to each other. One scenario is that DTLS must send a message to SRTP when a certificate is ready, this can't be done via Dial. SRTP doesn't use DTLS, it just gets a certificate from it.

How would you implement this? Can we have a generic API for 'events'?

Backkem Ah, I didn't know about this before. Basic options:

  • Add an event interface. Advantage: You get it everywhere. Disadvantage: Likely lots of generic String and Interface{} types.
  • Specific implementations per transport: Advantage: Static types, self documenting. Disadvantage: Custom logic needed to wire the transports together. (Note that a generic event interface would still need custom logic in the upper transport to interpret the generic events being thrown.)

==> One thought I had: The TransportBuilder wraps DTLS.Dial and DTLS.Accept with a tiny wrapper function. This function takes the concrete DTLS.Conn struct returned by these dial/accept methods. This concrete DTLSConn exposes the certPair. The wrapper passes the certpair to SRTP and returns the Conn to the upper layers. Example for DTLS.Dial:

func WrapDTLSDial(dial func(Transport.Conn) (*DTLS.Conn, error)) {
  conn, err := dial()
  if err != nil {
    return nil, err
  }

  srtpConn.SetCertPair(conn.certPair) // Should be possible to find a way to make srtpConn available here.

  return conn, nil
}

The TransportBuilder would be in charge of stitching all this together. This allows static typing and moves the stitching magic from the transport implementations to the TransportBuilder (which is what it's for).

parsing multiple time

I don't think this will be large performance concern, but is it an issue we need to parse multiple times? This might be doing lots of pointless copies. If Transports can only send/recv []byte will we be Marshaling/Unmarshaling multiple times? How would you implement DataChannel sending an SCTP message?

1.) DataChannel Transport constructs a SCTP struct, and populates the values 2.) DataChannel does an unmarshal to send via SCTP 3.) SCTP Marshals because it needs data about the SCTP packet we want to send

Backkem Indeed, the transport boundaries seem less clear cut than I originally thought.

==> One thought I had is that maybe trying to force the Transport interface on every layer isn't needed. The TransportBuilder (or TransportManager) can be smart enough to glue some different things together. In our case it seems more important to make the individual transports do their own thing rather than force them to conform to the Transport interface.

==> (Incomplete) idea for SCTP: The SCTP.Dial (or some other function) could return a struct that implements a 'Streamer' interface:

type Streamer interface {
  Stream(streamIdentifier uint16, payloadType PayloadProtocolIdentifier) Conn
}

Instead of consuming a Conn directly, the DataChannel transport can be get Conns from the Streamer. Now you can write raw data to the Conn returned by Steam() which can automatically (de-)chunk the data. -> The 'sad' part is that a data channel would have to open 5 'streams' in the worst case: DCEP, String, StringEmpty, Binary, BinaryEmpty (Who comes up with these things...). I don't hate the high amount of streams but don't love it either, maybe there is an iteration of this idea that makes it cleaner...

Implementing interfaces that we can't satisfy

Sean I think this is only an issue for SRTP/DataChannel/ICE but what will LocalAddr/RemoteAddr return (just be nil?) I actually don't even think this an issue just asserting! But that is really cool, people can use ICE and hook it up to anything that wants to write generic data over the internet.

Backkem Re-thinking it, we don't really care about LocalAddr and RemoteAddr within the context of Pion. Maybe we can just drop them and make the Conn interface look like a regular io.ReadWriteCloser. If we ever want to separate a transport into an individual package we can add these missing interfaces when needed. As long as the common ones are the same we should be good.

RTP Refactor

Quick write-up of my current thinking around the RTP refactor.

  • The ideas basically mirror the work done for package sctp & datachannel.
  • This is a high level sketch. It may need refinement to put into practice.
  • Make the SRTP Context generic
    • it already just duplicates most variables
    • Here it's annoying that EncryptRTP and EncryptRTCP have a different signature.
  • Create a Conn for SRTP
    • Has an incoming and outgoing context.
    • Methods:
Client(net.Conn, srtp.Config) *Session // Config passes KeyingMaterial & RTP profile
Server(net.Conn, srtp.Config) *Session
session.OpenStream(SSRC uint16) (*Stream, error)  // See sctp.OpenStream
session.AcceptStream() (*Stream, error) // Here there may be need for a little hack around PayloadType; See sctp.AcceptStream
session.Close() error
stream.Read([]byte) (int, error) // Returns payload in byte, drops header; Satisfies io.Reader for easy testing (see sctp.Read); decrypts using the correct context
stream.ReadSRTP([]byte) (int, rtp.Header, error) // Returns payload in byte, header in return argument (see sctp.ReadSCTP); decrypts using the correct context
stream.Write([]byte) (int, error)  // Uses default (useful if possible) RTP header; Satisfies io.Writer for easy testing (see sctp.Write); encrypts using the correct context
stream.WriteSRTP([]byte, rtp.Header) (int, error)  // Allows writing with a specific header (see sctp.WriteSRTP); encrypts using the correct context
stream.Close() error
  • Change the packetizers to a Conn to ingest samples.
    • Methods:
New(*srtp.Conn) *Conn // Constructor, direction doesn't matter I think
conn.Read([]byte) (int, error) // Assumes 1 sample; Satisfies io.Reader for easy testing  (see datachannel.Read)
conn.ReadSamples([]byte) (int, samples uint32, error) // Returns payload in byte, amount of samples in return argument (see datachannel.ReadDataChannel)
conn.Write([]byte) (int, error)  // Write 1 sample; Satisfies io.Writer for easy testing  (see datachannel.Write)
conn.WriteSamples([]byte, samples uint32) (int, error)  // Allows writing a specific amount of samples  (see datachannel.WriteDataChannel)
  • Create a Conn for SRTCP
    • Has an incoming and outgoing context.
    • Methods:
Client(net.Conn, srtp.Config) *Session // Config passes KeyingMaterial & RTP profile
Server(net.Conn, srtp.Config) *Session
session.OpenStream(SSRC uint16) (*Stream, error) // See sctp.OpenStream
session.AcceptStream() (*Stream, error) // See sctp.AcceptStream
session.Close() error
stream.Read([]byte) (int, error) // Returns payload in byte, drops header; Satisfies io.Reader for easy testing (see sctp.Read); decrypts using the correct context
stream.ReadSRTCP([]byte) (int, rtcp.Header, error) // Returns payload in byte, header in return argument (see sctp.ReadSCTP); decrypts using the correct context
stream.Write([]byte) (int, error)  // Uses some default RTCP header; Satisfies io.Writer for easy testing  (see sctp.Write); encrypts using the correct context
stream.WriteSRTCP([]byte, rtcp.Header) (int, error) // Allows writing with a specific header (see sctp.WriteSRTP); encrypts using the correct context
stream.Close() error
  • Mux everything using the same mux that is already in the network Manager:
mx := mux.NewMux(iceConn, receiveMTU)

dtlsEndpoint = mx.NewEndpoint(mux.MatchDTLS) // Mux DTLS

dtlsConn := setupDTLS(dtlsEndpoint) // Runs dtls.Client/Server to connect DTLS

srtpEndpoint = mx.NewEndpoint(mux.MatchSRTP) // Note: Change matching to only SRTP, not SRTCP
srtpConn := setupSRTP(srtpEndpoint, config) // Config is based on keying material from dtlsConn

srtcpEndpoint = mx.NewEndpoint(mux.MatchSRTCP) // Note: Change matching to only SRTCP, not SRTP
srtpConn := setupSRTP(srtcpEndpoint, config) // Config is based on keying material from dtlsConn

Optimizations

Current problems:

  • Double un-marshaling
  • Multiple buffer allocations

Double un-marshaling

The current implementation un-marshals the rt(c)p packet multiple times:

  • In session readloop (it currently doesn't but would have to to dispatch them to the correct Stream).
  • In Encrypt/Decrypt
  • In the upper layer consuming the packet In general this can be avoided by adding additional methods that pass the parsed packet header to the next layer.
ReadRTP

As mentioned above, in order to pass the rtp header to the upper layer we can add a ReadRTP method. Similar to the ReadSCTP method in the sctp package:

(s *Stream) ReadRTP(payload []byte) (int, *rtp.Header, error)

The plain Read method should still exist to implement the io.ReadWriter interface. It omits the header and only returns the payload. The same goes for the Write method:

(s *Stream) WriteRTP(header *rtp.Header, payload []byte) (int, error)

The plain Write method should still exist to implement the io.ReadWriter interface. It creates its own header with sane defaults: the SSRC is known by the Stream object, assume 1 sample, ... (see sctp & datachannel for an example).

Encrypt/Decrypt

The header can also be passed to the Encrypt/Decrypt methods to avoid double parsing:

(c *Context) DecryptRTP(header *rtp.Header, encrypted []byte) ([]byte, error)

Multiple buffer allocations

The first thing we can do to avoid allocating to many buffers is change the signature of the Encrypt/Decrypt methods so the target buffer is passed by the caller from:

Decrypt(encrypted []byte) ([]byte, error)

to:

Decrypt(encrypted, decrypted []byte) error

This way the caller can provide the target buffer, potentially reusing it to reduce buffer allocations. Next, instead of passing incoming packets from the session readloop to the Read() method, we can change this direction around. Take the buffer passed to the Read() method and pass it over a channel to the session readloop. This puts decisions over buffer allocation/copying in the readloop. (See example in internal/mux or dtls/internal/udp).

Decrypt into read buffer

One more thing we can do is decrypt straight into the read buffer. This however means that we have to either:

  • Return the entire package in the readbuffer.
  • Only decrypt the payload into the readbuffer.

-> Not sure what is best here. The second seems to be a cleaner abstraction if it's feasible for all use-cases. It also matches what we do in the other transports.

Putting it all together

The Decrypt would look like this:

(c *Context) DecryptRTP(header *rtp.Header, encrypted, decrypted []byte) (int, error)

The session readloop would look approximately as follows:

b := make([]byte, 8192)
for {
	// Read
	n, _ := s.nextConn.Read(b)

	// Parse header
	h := &rtp.Header{}
	h.Unmarshal(b[:headerSize])

	// Lookup the correct stream (omitted: create stream if it doesn't exist)
	stream := s.streams[h.SSRC]

	// Receive a buffer from the Read method
	readBuf := <-stream.readCh

	// Decrypt straight into the read buffer (see remarks under 'Decrypt into read buffer')
	s.remoteContext.DecryptRTP(h, b[:n], readBuf)

	// Pass the header and packet length to the Read method (omitted: struct to hold the data)
	stream.wroteCh <- {h, n}
}

Raw RTP

Goals:

  • Allow the srtp.Stream to read/write only the payload data (Header still available in ReadRTP/WriteRTP). This allows the cleanest abstraction, inline with the other transports.
  • Allow passing raw rtp packets to the srtp.Session
  • The raw rtp packets can be for multiple SSRC's

The idea is to provide a separate API for raw RTP so the srtp.Stream can just deal with payload data as described in the previous sections. Suggested API: First for the srtp.Session. We'll add the ability to write raw packets:

// Write writes full rtp packets to the rtp session, regardless of their SSRC.
(s *Session) Write(b []byte) (int, error)

Next we need a way to expose sending raw RTP through the public WebRTC & ORTC APIs. Let's first take a look what the normal non-raw flow looks like with ORTC:

// Create a track
track := webrtc.NewTrack(payloadType uint8, ssrc uint32, id, label string)
// Create the sender
audioSender := webrtc.NewRtpSender(track, dtls)
// Start the sender
audioSender.Send()
// Get capabilities to exchange
var sendAudioCaps = audioSender.GetCapabilities("audio");

Next, some design considerations:

  • For sending raw tracks it seems to be convenient to have the same API: A Track
  • The DtlsTransport will likely hold the srtp.Session. Therefore it makes sense to add the track straight to the DTLS transport. That also avoids another RtpSender-like object.
  • We have to take into account that the tracks (SSRC mapping) needs to be negotiated. The SDP (m= lines) for this will be built based on the RtpSender objects so these have to be created, even if we won't use them for sending.

API suggestion:

// ORTC: Get a track for sending raw packets
func (t *DtlsTransport) NewRawTrack() *Track

// ORTC: Leaving signaling up to the user seems fine for ORTC.
// We could provide some helpers to create the correct RtpCapabilities objects.

// WebRTC: Get a track for sending raw packets
func (pc *PeerConnection) NewRawTrack() *Track

// WebRTC: Allow registering tracks that are only used for signaling, not sending
// Note: This allows you to create the required (m= lines).
// Under the hood this will probably create RtpSender object but never start them.
func (pc *PeerConnection) NewDummyTrack(payloadType uint8, ssrc uint32, id, label string)

Track API

Track is a Pion specific structure that somewhat mirrors MediaStreamTrack. It serves as the entry/exit point for media. RtpSender and RtpReceiver are not written against directly because tracks can be replaced on them at anytime (switching front/rear camera on a mobile phone)

Track by default has no internal cache and just reads/writes to the nextConn (or errors if it doesn't have one)

I propose dropping samples from the public API. We can have people generate packets from samples, this reduces the complexity of the API

For handling loss/jitter we have two choices

  • people can wrap a Track with a github.com/pions/rtpengine and they can decide how much latency they want (and pull stats)
  • we put a rtpengine inside the RtpSender and RtpReceiver we can give people a hook to mess with it, but this might be the best approach things will just 'work' by default for most

Having Read/Write methods might be confusing,

  • do we expect users to demux RTP and RTCP?
  • Do we assert what the user is sending, what if they send garbage. What is stopping them from just sending junk SCTP or hand crafted ICE?
  • Should Read/Write be RTP only?
track.Read([]byte) (int, error)   // How do we handle RTP and RTCP? Do we just allow people to write anything?
track.Write([]byte) (int, error)  // How do we handle RTP and RTCP? Do we just allow people to write anything?

// TODO
track.WriteRTP(*rtp.Packet) error
track.WriteRTCP(rtcp.Packet) error

track.ReadRTP(*rtp.Packet) error
track.ReadRTCP(rtcp.Packet) error
Track feedback @backkem
  • I like putting the additional functionality in rtpengine. We should determine if we want to build this as a single black box vs a collection of pipeline building blocks.
  • On internal vs external rtpengine: It would seem we can allow for both internal and external rtpengine. If we make the internal & external API are the same we can allow for supplying your own engine or wrapping with your own engine. This would still allow us to set a good default for must uses.
  • I'm a bit conflicted about adding the context to the API. I wonder how important it would be to cancel writing a single packet (without closing)? It seems we would also have to add this to the rest of the underlying stack or there wouldn't be much to cancel? In the current design closing a track would mean closing the underlying srtp stream, but not the rest of underlying the stack. This seems sufficient for re-negotiating tracks?
  • My suggestion for Read/Write from above: Make assumptions. For example in pions/datachannel this means Read & Write assume you are working with Binary data, not Strings. In this case RTP only doesn't seem like a bad assumption.
  • I still prefer ReadRTP(payload []byte) (int, *rtp.Header, error) and WriteRTP(header *rtp.Header, payload []byte) (int, error) over passing *rtp.Packet. The performance problem Luke mentioned can also be solved by extending this API down to srtp.Session (by implementing a SessionRTP and SessionRTCP which have the header+[]byte signature).
  • We should make sure to document that the implementer has to consume both RTP & RTCP or risk blocking the stack. We can provide an option to drain RTCP if the user doesn't care about it.
Track feedback @kixelated

I think these should be the requirements:

  1. Write maps to the correct socket based on the track SSRC.
  2. Write writes the data to the socket without buffering.
  3. WriteRTP/WriteRTCP additionally performs marshaling and encryption.
  4. Read maps from the track SSRC to the correct socket.
  5. Read reads from the data from the socket without buffering
  6. ReadRTP/ReadRTCP additionally performs unmarshalling and decryption.

The kernel will be responsible for buffering reads and writes. This is done without running intermediate Goroutines for higher performance. If the caller wishes to perform lossy-reads or lossy-writes, they can use a goroutine and buffered channel like the existing code.

As for context, it's fine to drop it because the underlying net.Conn doesn't actually support it. This makes me sad but there no point adding extra overhead to support context in a sub-optimal way.

I don't think Write or Read are particularly useful and can be made private. There's really no point performing the encryption/decryption there because you need to know if it's an RTP/RTCP packet, in which case they should just use the other methods.