Skip to content

HighPerformancePion

Sean DuBois edited this page Dec 22, 2021 · 4 revisions

High Performance Pion

Pion was cited as not meeting the performance needs of Signal here. The topic was discussed in Slack here, and people suggested we can do things to improve performance. This document describes actionable things we could do, and how to assert we improved performance.

Core Changes

These are the high level concepts we need to adhere too.

User Provided Buffers

Today we maintain buffers internally, and then copy into the user provided buffer.

One example of this is packetio.Buffer in pion/ice and pion/srtp. As soon as a packet arrives to the ICE Agent it is copied into a packetio.Buffer. When the user calls Read the buffer is copied out of the packetio.Buffer into the provided buffer.

Instead network input should be copied into buffers provided by the user directly. Avoiding costly extra copies.

Good Copying network input directly into user provided buffers allows us to avoid an extra copy. It also means that we don't have to waste memory with internal buffers.

Bad If users don't read fast enough data will be lost. packetio.Buffer allows us to store data if a user isn't reading packets fast enough. Now users will need to depend on OS buffers for network traffic.

Data that isn't being actively Read by the user will have to be discarded. If RTP packets arrive for the following SSRCes arrive 1, 2, 3 and the user is only attempting to Read 2 and 3 we need to discard 1. We can never be sure that users will ever request 1.

Cancelling a single Read may be tricky. Making this work in a performant manner will be important.

No (Un)Marshaling

Today modifying values requires a full Marshal/Unmarshal. If you want to change a value we process the entire packet, and then have to repack the entire packet.

One example of this is rtp.Packet. Changing the SSRC is an expensive operation. To get a SSRC you need to Unmarshal the packet, then access the SSRC member of the struct. This requires parsing the entire packet. Then to write it back to the network it requires the entire packet to be marshaled into a []byte.

Instead we should always pass around a []byte and define methods upon that. Accessing+Setting SSRC values would just involve update an offset.

Software Changes

pion/ice

API Changes None

Internal Changes After a selected candidate pair is chosen the Read provided by the user will be used for network input.

Before the selected candidate pair is chosen we will need to perform copies. We need to read from multiple sockets at once.

Is it possible to use one buffer for multiple Read calls? Should we not do concurrent Reads?

pion/srtp

API Changes We will need to modify srtp.Session extensively. It provides callbacks and internal buffering. We may need to instead do demuxing inside of pion/webrtc ourselves?

Internal Changes None, only deletion of srtp.Session

pion/webrtc

API Changes If we have multiple Read calls happening for independent SSRCes how do we know which buffer to use? The first Read call will be passed into the ICE Agent and will read the packet off the network. What happens in the following code if SSRC 3 is the first to arrive?

a := ice.Agent{}


a.Read(buffA) // Read for SSRC 1
a.Read(buffB) // Read for SSRC 2
a.Read(buffC) // Read for SSRC 3

I believe we will have to use callback based API. The lifetime of the buffer will be however long the callback is executed.

package main

import (
    "crypto/rand"
    "fmt"
)

func main() {
    callbackMap := map[byte]func([]byte, int){
        0:   func([]byte, int) { fmt.Println("Got 0") },
        10:  func([]byte, int) { fmt.Println("Got 10") },
        25:  func([]byte, int) { fmt.Println("Got 25") },
        250: func([]byte, int) { fmt.Println("Got 500") },
    }

    buff := make([]byte, 1500)
    for {
        n, _ := Read(buff)
        if callback, ok := callbackMap[buff[0]]; ok {
            callback(buff, n)
        }
    }
}

func Read(buff []byte) (int, error) {
    rand.Read(buff)
    return len(buff), nil
}

I have explored making Read work with a single buffer and demuxing, but haven't found a way to do it with zero copies.

Internal Changes

pion/sctp

Out of scope

pion/dtls

Out of scope

Testing

Package Level

End-to-End