Implement Transcoding #9

0xcaff · 2018-04-27T02:45:46Z

Overview

There needs to be something which takes a file on the disk and makes transcoded versions available at certain URLs. There should be transcoded versions with a few formats (AAC, MP3) and a few constant bitrates (320, 190).

Resources

Using FFMpeg Binary

The way koel handles transcoding is by calling the ffmpeg binary telling it to convert to a format and output the result to STDOUT. This works, but this adds a runtime dependency on ffmpeg. It might be a worthy tradeoff considering how complicated using the ffmpeg library is. Using things like ffmpeg-static to include or download a version of ffmpeg or shipping a docker/appimage/flatpak could make this easy.

Upon further investigation, it seems that installing ffmpeg on fedora is non-trivial. It is available through RPM fusion.

Also, upon further investigation, it seems that transcoding to stdout and a file are two different things. Files have random access, while pipes don't. Practically, this means that the duration doesn't show up in the audio bar at runtime and the file is played as an indefinite stream. This is a problem in koel (koel/koel#306 (comment)).

Pre-Transcoding Files

For cases where the transcoded files are read very often, it makes sense to transcode the files as soon as they are uploaded. Here, we don't mind trading off space for CPU time. We can make this configurable later with some caching parameters.

Formats

We want to transcode to formats which are one of two things.

Playable on many devices.
Low bandwidth.

We'd like to eventually support the following devices. Here are the formats they support.

Google Assistant (https://developers.google.com/actions/assistant/responses#media_responses)
Says it will accept a properly formatted MP3. Deezer also says it has been able to somehow stream lossless flac to Google Assistant devices somehow.

https://support.deezer.com/hc/en-gb/articles/115003803629-Google-Home
Amazon Alexa (https://developer.amazon.com/docs/alexa-voice-service/recommended-media-support.html)

Hmm, it says only up to 256kbps mp3 and AAC are supported. This is probably a constraint placed by low end alexa devices.
Android (https://developer.android.com/guide/topics/media/media-formats#audio-codecs)
iOS (https://stackoverflow.com/questions/1761460/supported-audio-file-formats-in-iphone) (https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/MultimediaPG/UsingAudio/UsingAudio.html)

Based on these constraints, the following codecs should be supported:

MP3 V0. Preferred for Google Assistant.
AAC vbr quality 5 (2 channels). Preferred for iOS, Android, Alexa.

We will consider adding some lower bandwidth options when building the mobile applications.

Needless Conversion

Conversion shouldn't be allowed in some cases like going between two lossless formats (AACv5 -> MP3v0), but there is no objective metric for whether a conversion should happen or not. For now, we'll let the user choose what format they want to stream their music at runtime and decide in constrained environments like the Google Assistant.

TODO

0xcaff · 2018-05-29T22:45:26Z

It looks like transcoding is going to be a bit more difficult than I thought.

Here's the problem. Given an offset and a length, get length bytes at offset from the input stream decoded then encoded as the desired output stream type. It looks like there is no way for random access transcoding to mp3; all data up to offset needs to be transcoded. For transcoding to be efficient, the result needs to be cached. So that as the song is played on the client, the song isn't re-transcoded for every range request.

There are two parameters to a cache. Where to store the cached values and where how long to keep them in the cache for.

The cached values should be stored in memory. It is cheap to populate the cache (~1m to convert a 3m FLAC file to mp3 on an old machine) and storing the cache in memory, allows for handling evictions cheaply.

How long an item stays in the cache should be bounded both by time and size. The item should stay in the cache for 1.5x the duration of the audio since the time it was last accessed. Need to think about the size metric more. Here's a library for implementing caching.

0xcaff · 2018-05-31T07:45:44Z

FFmpeg is the state of the art, but how to use their code isn't very discoverable due to lack of documentation or any sort of constraints in the type system. I've created a binary which can decode an audio and encode it into a CBR mp3.

Next up is designing the integration between this code and the endpoints.

0xcaff · 2018-06-01T06:01:03Z

AWW YEAH! 2 is done!!! in process on the fly transcoding! 🎉 !

Primitive eager transcoding of a audio file has been implemented. This was both difficult and rewarding to implement. There are a couple of memory management kinks which should be worked out during the next step. The ffmpeg create is capable of being statically linked. This dependency really slows down compilation time. See #9

0xcaff · 2018-06-01T06:34:40Z

Unfortunately, the build is broken because of ffmpeg. forte-music/core-build#8

0xcaff · 2018-06-07T23:52:22Z

Statically linking ffmpeg is kinda difficult. Using the automated configuration works in the docker image, but brings in tons of un-needed dependencies. https://circleci.com/gh/forte-music/core/376

It doesn't work on my local machine and fails silently because rust-lang/pkg-config-rs#68

The two options are:

give it all the dependencies it needs bloating the binary and increasing complexity
build a version of ffmpeg without un-needed features. some modification will need to be made to the rust wrapper because it tries to link every possibly needed extra library statically.

0xcaff · 2018-06-09T15:36:41Z

The best bet is to probably compile our own version of ffmpeg. It looks like ffmpeg feature flags can be accessed through cargo and only needed dependencies are statically linked in. The other option doesn't give us enough control.

0xcaff · 2018-06-10T03:50:39Z

ffmpeg is mostly statically linked at this point. A couple dependencies aren't linked though. The build should be updated to fail in this case. The debug binary is a 243mb file. This should probably be reduced.

0xcaff · 2018-08-15T19:01:49Z

@Mcat12 What are your thoughts on making ffmpeg a runtime dependency? Instead of using the ffmpeg library, we could call the cli in a process to do transcoding and output the transcoded result to STDOUT.

Pros

Builds would be faster and smaller.
Less code. There's about 150 lines of complex ffmpeg code to do transcoding which transcode an entire file given a path. Making transcoding yield execution after some bytes are output requires using generators or a manually crafted state machine. The yielding transcoder will be more (complex) code.
It's easy to have a yielding transcoder with the CLI, just stop reading and backpressure will cause the process on the other end of the pipe to stop writing.

Cons

Runtime dependency. It's not that hard of a runtime dependency to have though. It is in the package manager for many distros and could be easily bundled in a docker container.

AzureMarker · 2018-08-15T19:05:16Z

If it's as simple as apt install ffmpeg or equivalent, then that's good. Just make sure we can handle all possible errors and that it's not fragile

It looks like the streaming solution won't work for reasons outlined in #9. Switched to just transcoding the file before starting to stream it using the ffmpeg command line. It compiles but there's alot of things to do yet.

0xcaff · 2018-08-27T03:09:13Z

Just tested it playing a v0 mp3 transcoded from FLAC on the Google Home and it works!

0xcaff changed the title ~~Implement Music Streaming Part~~ Implement Transcoding May 27, 2018

0xcaff mentioned this issue May 27, 2018

Switch to Actix #37

Closed

0xcaff added this to Backlog in Release May 27, 2018

0xcaff moved this from Backlog to Ready to Work On in Release May 28, 2018

0xcaff moved this from Ready to Work On to Backlog in Release May 28, 2018

0xcaff moved this from Backlog to In Progress in Release May 29, 2018

0xcaff self-assigned this Jun 1, 2018

0xcaff moved this from In Progress to Backlog in Release Jun 10, 2018

0xcaff moved this from Backlog to In Progress in Release Aug 18, 2018

0xcaff mentioned this issue Aug 18, 2018

Implemented Transcoding and Streaming Responses #54

Merged

3 tasks

0xcaff moved this from In Progress to In Review in Release Aug 18, 2018

AzureMarker closed this as completed in #54 Aug 26, 2018

0xcaff moved this from In Review to Done in Release Sep 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Transcoding #9

Implement Transcoding #9

0xcaff commented Apr 27, 2018 •

edited

0xcaff commented May 29, 2018 •

edited

0xcaff commented May 31, 2018

0xcaff commented Jun 1, 2018 •

edited

0xcaff commented Jun 1, 2018

0xcaff commented Jun 7, 2018

0xcaff commented Jun 9, 2018

0xcaff commented Jun 10, 2018

0xcaff commented Aug 15, 2018 •

edited

AzureMarker commented Aug 15, 2018

0xcaff commented Aug 27, 2018

Implement Transcoding #9

Implement Transcoding #9

Comments

0xcaff commented Apr 27, 2018 • edited

Overview

Resources

Using FFMpeg Binary

Pre-Transcoding Files

Formats

Needless Conversion

TODO

0xcaff commented May 29, 2018 • edited

0xcaff commented May 31, 2018

0xcaff commented Jun 1, 2018 • edited

0xcaff commented Jun 1, 2018

0xcaff commented Jun 7, 2018

0xcaff commented Jun 9, 2018

0xcaff commented Jun 10, 2018

0xcaff commented Aug 15, 2018 • edited

Pros

Cons

AzureMarker commented Aug 15, 2018

0xcaff commented Aug 27, 2018

0xcaff commented Apr 27, 2018 •

edited

0xcaff commented May 29, 2018 •

edited

0xcaff commented Jun 1, 2018 •

edited

0xcaff commented Aug 15, 2018 •

edited