Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H264 ffmpeg with zerolatency but play with green blocks #2424

Open
coconutLatte opened this issue Feb 24, 2023 · 12 comments
Open

H264 ffmpeg with zerolatency but play with green blocks #2424

coconutLatte opened this issue Feb 24, 2023 · 12 comments
Labels
bug Something isn't working difficulty:hard

Comments

@coconutLatte
Copy link

Your environment.

  • Version: v3.1.55
  • Browser: Chrome 110.0.5481.177

What did you do?

Hi,
I'm an software engineer working on remote desktop solution in website. Due to the low latency, we chose webrtc as network to transport video.
In server-side encoder, we encode video by ffmpeg, codec is h264, x264opts set tune=zerolatency.
proxy h264 stream like example https://github.com/pion/example-webrtc-applications/tree/master/play-from-disk-h264.
But the website doesn't play normal, screen seems splited into 4 parts transversally, some parts will be green flashing...
I already dump h264 data from encoder, play it by ffplay, it goes well. But website runs not good...

the most magic thing is, if I not set x264opts tune=zerolatency, It shows well in web, if set, got green block.

What did you expect?

video plays well on website

What happened?

video play with green block
image

@coconutLatte
Copy link
Author

coconutLatte commented Feb 24, 2023

data

this file is h264 data with tune=zerolatency, and play well by ffplay
(github cannot upload .264 file, can download it and play by ffplay)

@Sean-Der
Copy link
Member

Hi @coconutLatte !

Would you mind sharing the exact ffmpeg command you ran? I will reproduce + fix against the example-webrtc-applications then

@coconutLatte
Copy link
Author

coconutLatte commented Feb 25, 2023

Hi @Sean-Der !
Thanks for replay.

First of all, I'm not using ffmpeg command to encode it. I encode it by C code
av_opt_set(context->priv_data, "tune", "zerolatency", 0);
But I can recurrence the zerolatency problem by using ffmpeg command.

There are two h264 file below video_raw.264 and video_zerolatency.264 (github not support 264 ext. I add .jpg ext, please delete it)
video_raw 264
video_zerolatency 264

video_zerolatency.264 is converted from video_raw.264 by running ffmpeg command line ffmpeg -i video_raw.264 -vcodec h264 -tune zerolatency video_zerolatency.264

using video_raw.264 to play in https://github.com/pion/example-webrtc-applications/tree/master/play-from-disk-h264 is normal.

using video_zerolatency.264 to play in https://github.com/pion/example-webrtc-applications/tree/master/play-from-disk-h264 is full of green blocks. like below
image

@coconutLatte
Copy link
Author

Hi @Sean-Der !
There are something new we found just now.

When turn on tune=zerolatency in ffmpeg, the I frame being splited in to slices, which start with 00 00 01.

But in pion/rtp/codecs/h264_packet.go (version what pion/webrtc v3.1.55 import is v1.7.13) fulfill h264 packet always add prefix 00 00 00 01 (func annexbNALUStartCode)
image

maybe when h264reader read one nal, we should know finally packet it with 00 00 00 01 or 00 00 01 in head
I don't know if this matters? raw messge is 00 00 01 but change it to 00 00 00 01

@Fruneng
Copy link

Fruneng commented Feb 27, 2023

Does it caused by RTP STAP-A Packet in Non-interleaved mode?
https://www.rfc-editor.org/rfc/rfc6184#section-5.4

@retamia
Copy link

retamia commented Mar 14, 2023

FFmpeg tune zerolatency will enable multi-slice encoding, and one frame of image will have many h264 nal units. H.264 stream example In the example code, each nal unit sends an rtp packet, and the rtp timestamp will be added with a duration, which causes the image not the same frame when decoding

@CastriOnlive
Copy link

CastriOnlive commented May 12, 2023

Is there any fix on this, i have tested many configurations but if activating -tune zerolatency i don't know how to decode later the Nal units to avoid the green screen, has anyone achieved this with pion. Is vp8 the only alternative for zerolatency?

@retamia
Copy link

retamia commented Jun 25, 2023

@VictorCPH ((nal.Data[1] & 0x80) >> 7) == 1is the slice start flag, which means that the subsequent slices are all in the same frame, until the next slice start flag. Checking the slice start flag will make your code more robust.

@trey-hakanson-skydio
Copy link

trey-hakanson-skydio commented Apr 5, 2024

@Sean-Der I'm running into a similar issue, and would appreciate any tips on how to debug! I'm using this sample mp4 file, but I've seen the same issue with any I've tried so it shouldn't really matter. If I use the command mentioned in play-from-disk-h264, it everything works as expected:

ffmpeg -i $INPUT_FILE -an -c:v libx264 -bsf:v h264_mp4toannexb -b:v 2M -max_delay 0 -bf 0 output.h264

But, when I add -tune zerolatency, I encounter the issue described above:

ffmpeg -i $INPUT_FILE -an -c:v libx264 -bsf:v h264_mp4toannexb -b:v 2M -max_delay 0 -bf 0 -tune zerolatency output.h264

Digging into the generated h264 files, the only real difference I see is that when using zerolatency multiple NALUs of type 5 (IDR slice) are generated, as opposed to 1 otherwise:

ffmpeg ffmepg with zerolatency
Screen Shot 2024-04-05 at 11 11 42 AM Screen Shot 2024-04-05 at 11 12 02 AM

My initial thought was that the following logic in pion/rtp would be problematic if the sample provided to WriteSample did not include the SPS, PPS, and all IDR slices so they could all be included in the same RTP packet. But after reading rfc6184 5.2 more closely, I'm less convinced I'm on the right track. pion/rtp loads the SPS and PPS into the STAP-A NALU and then sends the IDR slices as separate FU-A NALUs, which should be fine and handled by the browser's decoder. Sending all the NALUs corresponding to the iframe does at least cause them to have the same RTP timestamp, which helped a little: the browser rendered a full iframe with no green bars, but didn't start playing back based on non-IDR NALUs afterwards.

Still digging into this on my end, but would appreciate any thoughts 🙂

@basicfu
Copy link

basicfu commented Apr 6, 2024

Hello, have you resolved this issue? Currently, I also need zero latency remote desktop stuck here @coconutLatte
image

@trey-hakanson-skydio
Copy link

trey-hakanson-skydio commented Apr 7, 2024

Ok I think I've figured it out; after staring at the NALs for a while, I realized that tune=zerolatency doesn't just affect the IDR slices, it also affects the non-IDR slices. The slicing seems to be consistent at the frame level: if I have 10 IDR slices for a frame, I will also have 10 non-IDR slices in each frame before the next I frame. There's definitely a more intelligent way to do this based on some metadata in the NALUs (maybe frame_num in the slice header?), but to validate that this was the right idea I did some naive buffering to ensure all NALUs from the same frame were in the same sample. Without doing this buffering, the ticker logic in the example doesn't work quite right because you're sending n NALUs of the same frame with different RTP timestamps, and the browser can't figure out how to decode them. I've included the janky buffer logic I used below. I used H264 Naked to look at the NALUs in my sample file and determine 10 was the correct NALUs per frame. I'm going to see if I can get something more robust working based on NALU metadata.

diff --git a/play-from-disk-h264/main.go b/play-from-disk-h264/main.go
index cd05bcc..17f9bfe 100644
--- a/play-from-disk-h264/main.go
+++ b/play-from-disk-h264/main.go
@@ -10,6 +10,7 @@ package main
 import (
 	"context"
 	"errors"
+	"flag"
 	"fmt"
 	"io"
 	"os"
@@ -23,13 +24,24 @@ import (
 )
 
 const (
-	audioFileName     = "output.ogg"
-	videoFileName     = "output.h264"
 	oggPageDuration   = time.Millisecond * 20
 	h264FrameDuration = time.Millisecond * 33
 )
 
+var (
+	audioFileName     string
+	videoFileName     string
+	sessionDescriptor string
+	nalsPerSample     int
+)
+
 func main() { //nolint
+	flag.StringVar(&audioFileName, "ain", "output.ogg", "audio file to process")
+	flag.StringVar(&videoFileName, "vin", "output.h264", "video file to process")
+	flag.StringVar(&sessionDescriptor, "sd", "", "session descriptor")
+	flag.IntVar(&nalsPerSample, "nps", 1, "session descriptor")
+	flag.Parse()
+
 	// Assert that we have an audio or video file
 	_, err := os.Stat(videoFileName)
 	haveVideoFile := !os.IsNotExist(err)
@@ -41,6 +53,11 @@ func main() { //nolint
 		panic("Could not find `" + audioFileName + "` or `" + videoFileName + "`")
 	}
 
+	// Assert that we have a session descriptor
+	if sessionDescriptor == "" {
+		panic("Session descriptor must be provided")
+	}
+
 	// Create a new RTCPeerConnection
 	peerConnection, err := webrtc.NewPeerConnection(webrtc.Configuration{
 		ICEServers: []webrtc.ICEServer{
@@ -62,7 +79,10 @@ func main() { //nolint
 
 	if haveVideoFile {
 		// Create a video track
-		videoTrack, videoTrackErr := webrtc.NewTrackLocalStaticSample(webrtc.RTPCodecCapability{MimeType: webrtc.MimeTypeH264}, "video", "pion")
+		videoTrack, videoTrackErr := webrtc.NewTrackLocalStaticSample(webrtc.RTPCodecCapability{
+			MimeType:    webrtc.MimeTypeH264,
+			SDPFmtpLine: "level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f",
+		}, "video", "pion")
 		if videoTrackErr != nil {
 			panic(videoTrackErr)
 		}
@@ -106,17 +126,36 @@ func main() { //nolint
 			// * avoids accumulating skew, just calling time.Sleep didn't compensate for the time spent parsing the data
 			// * works around latency issues with Sleep (see https://github.com/golang/go/issues/44343)
 			ticker := time.NewTicker(h264FrameDuration)
-			for ; true; <-ticker.C {
-				nal, h264Err := h264.NextNAL()
-				if errors.Is(h264Err, io.EOF) {
-					fmt.Printf("All video frames parsed and sent")
-					os.Exit(0)
-				}
-				if h264Err != nil {
-					panic(h264Err)
+			for {
+				<-ticker.C
+
+				nals := 0
+				buffer := make([]byte, 0)
+
+				for nals < nalsPerSample {
+					nal, h264Err := h264.NextNAL()
+					if errors.Is(h264Err, io.EOF) {
+						fmt.Printf("All video frames parsed and sent")
+						os.Exit(0)
+					} else if h264Err != nil {
+						panic(h264Err)
+					}
+
+					if nal.UnitType == h264reader.NalUnitTypeSPS {
+						// no-op
+					} else if nal.UnitType == h264reader.NalUnitTypePPS {
+						// no-op
+					} else {
+						nals += 1
+					}
+
+					if len(buffer) != 0 { // append start code as delimiter after first NAL
+						buffer = append(buffer, []byte{0, 0, 1}...)
+					}
+					buffer = append(buffer, nal.Data...)
 				}
 
-				if h264Err = videoTrack.WriteSample(media.Sample{Data: nal.Data, Duration: h264FrameDuration}); h264Err != nil {
+				if h264Err = videoTrack.WriteSample(media.Sample{Data: buffer, Duration: h264FrameDuration}); h264Err != nil {
 					panic(h264Err)
 				}
 			}
@@ -218,7 +257,7 @@ func main() { //nolint
 
 	// Wait for the offer to be pasted
 	offer := webrtc.SessionDescription{}
-	signal.Decode(signal.MustReadStdin(), &offer)
+	signal.Decode(sessionDescriptor, &offer)
 
 	// Set the remote SessionDescription
 	if err = peerConnection.SetRemoteDescription(offer); err != nil {

Edit: looking at metadata in NALU slice header doesn't really seem feasible: where frame_num will be in the slice header seems to depend on a lot of things like active SPS/PPS, profile, etc. Parsing is probably best left to the decoder. In my case, I'm getting the NALUs from the encoder on a per frame basis, along with a frame number, which should be enough to munge the RTP timestamp on the sample.

@kevmo314
Copy link
Contributor

Have you tried disabling sliced-threads? I ran across something very similar a while back ago: https://groups.google.com/g/discuss-webrtc/c/3tLWL9yyjsA

@Sean-Der Sean-Der added bug Something isn't working difficulty:hard labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working difficulty:hard
Projects
None yet
Development

No branches or pull requests

8 participants