Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

Implement Transcoding #9

Closed
14 tasks done
0xcaff opened this issue Apr 27, 2018 · 10 comments
Closed
14 tasks done

Implement Transcoding #9

0xcaff opened this issue Apr 27, 2018 · 10 comments
Assignees
Projects

Comments

@0xcaff
Copy link
Member

0xcaff commented Apr 27, 2018

Overview

There needs to be something which takes a file on the disk and makes transcoded versions available at certain URLs. There should be transcoded versions with a few formats (AAC, MP3) and a few constant bitrates (320, 190).

Resources

Using FFMpeg Binary

The way koel handles transcoding is by calling the ffmpeg binary telling it to convert to a format and output the result to STDOUT. This works, but this adds a runtime dependency on ffmpeg. It might be a worthy tradeoff considering how complicated using the ffmpeg library is. Using things like ffmpeg-static to include or download a version of ffmpeg or shipping a docker/appimage/flatpak could make this easy.

Upon further investigation, it seems that installing ffmpeg on fedora is non-trivial. It is available through RPM fusion.

Also, upon further investigation, it seems that transcoding to stdout and a file are two different things. Files have random access, while pipes don't. Practically, this means that the duration doesn't show up in the audio bar at runtime and the file is played as an indefinite stream. This is a problem in koel (koel/koel#306 (comment)).

Pre-Transcoding Files

For cases where the transcoded files are read very often, it makes sense to transcode the files as soon as they are uploaded. Here, we don't mind trading off space for CPU time. We can make this configurable later with some caching parameters.

Formats

We want to transcode to formats which are one of two things.

  1. Playable on many devices.
  2. Low bandwidth.

We'd like to eventually support the following devices. Here are the formats they support.

Based on these constraints, the following codecs should be supported:

  • MP3 V0. Preferred for Google Assistant.
  • AAC vbr quality 5 (2 channels). Preferred for iOS, Android, Alexa.

We will consider adding some lower bandwidth options when building the mobile applications.

Needless Conversion

Conversion shouldn't be allowed in some cases like going between two lossless formats (AACv5 -> MP3v0), but there is no objective metric for whether a conversion should happen or not. For now, we'll let the user choose what format they want to stream their music at runtime and decide in constrained environments like the Google Assistant.

TODO

  • Implement Range Responses #21
  • Respond to range requests from an io.Read + io.Seek (https://github.com/forte-music/core/commits/feature/streaming)
  • Given a file path, figure out how to transcode the file at that path to a format and bitrate.
  • Given a request for some set of bytes in a transcoded file, fulfill the request using a shared cached transcoder instance.
  • Make Transcoding Asynchronous
  • Handle Transcoding Errors
  • Use Actual Paths for Transcoded Files
  • Test TemporaryFiles
  • Test RangeStream (difficult to test, seems to work, low churn, should probably be moved to a library)
  • Test FileStream
  • Test TranscodingFileHandler (low risk, low reward)
  • Test Transcoder (can't really test the actual transcoding without a decoder, should probably test because of runtime borrow checking, can't easily replace the transcoding future)
  • Test TranscodeTarget (data container nothing to test)
  • Configurable FFMPEG Path (Configuration File #51)
@0xcaff 0xcaff changed the title Implement Music Streaming Part Implement Transcoding May 27, 2018
@0xcaff 0xcaff mentioned this issue May 27, 2018
@0xcaff 0xcaff added this to Backlog in Release May 27, 2018
@0xcaff 0xcaff moved this from Backlog to Ready to Work On in Release May 28, 2018
@0xcaff 0xcaff moved this from Ready to Work On to Backlog in Release May 28, 2018
@0xcaff 0xcaff moved this from Backlog to In Progress in Release May 29, 2018
@0xcaff
Copy link
Member Author

0xcaff commented May 29, 2018

It looks like transcoding is going to be a bit more difficult than I thought.

Here's the problem. Given an offset and a length, get length bytes at offset from the input stream decoded then encoded as the desired output stream type. It looks like there is no way for random access transcoding to mp3; all data up to offset needs to be transcoded. For transcoding to be efficient, the result needs to be cached. So that as the song is played on the client, the song isn't re-transcoded for every range request.

There are two parameters to a cache. Where to store the cached values and where how long to keep them in the cache for.

The cached values should be stored in memory. It is cheap to populate the cache (~1m to convert a 3m FLAC file to mp3 on an old machine) and storing the cache in memory, allows for handling evictions cheaply.

How long an item stays in the cache should be bounded both by time and size. The item should stay in the cache for 1.5x the duration of the audio since the time it was last accessed. Need to think about the size metric more. Here's a library for implementing caching.

@0xcaff
Copy link
Member Author

0xcaff commented May 31, 2018

FFmpeg is the state of the art, but how to use their code isn't very discoverable due to lack of documentation or any sort of constraints in the type system. I've created a binary which can decode an audio and encode it into a CBR mp3.

Next up is designing the integration between this code and the endpoints.

@0xcaff 0xcaff self-assigned this Jun 1, 2018
@0xcaff
Copy link
Member Author

0xcaff commented Jun 1, 2018

AWW YEAH! 2 is done!!! in process on the fly transcoding! 🎉 !

image

0xcaff added a commit that referenced this issue Jun 1, 2018
Primitive eager transcoding of a audio file has been implemented.  This
was both difficult and rewarding to implement. There are a couple of
memory management kinks which should be worked out during the next step.

The ffmpeg create is capable of being statically linked. This dependency
really slows down compilation time.

See #9
@0xcaff
Copy link
Member Author

0xcaff commented Jun 1, 2018

Unfortunately, the build is broken because of ffmpeg. forte-music/core-build#8

@0xcaff
Copy link
Member Author

0xcaff commented Jun 7, 2018

Statically linking ffmpeg is kinda difficult. Using the automated configuration works in the docker image, but brings in tons of un-needed dependencies. https://circleci.com/gh/forte-music/core/376

It doesn't work on my local machine and fails silently because rust-lang/pkg-config-rs#68

The two options are:

  • give it all the dependencies it needs bloating the binary and increasing complexity
  • build a version of ffmpeg without un-needed features. some modification will need to be made to the rust wrapper because it tries to link every possibly needed extra library statically.

@0xcaff
Copy link
Member Author

0xcaff commented Jun 9, 2018

The best bet is to probably compile our own version of ffmpeg. It looks like ffmpeg feature flags can be accessed through cargo and only needed dependencies are statically linked in. The other option doesn't give us enough control.

@0xcaff 0xcaff moved this from In Progress to Backlog in Release Jun 10, 2018
@0xcaff
Copy link
Member Author

0xcaff commented Jun 10, 2018

ffmpeg is mostly statically linked at this point. A couple dependencies aren't linked though. The build should be updated to fail in this case. The debug binary is a 243mb file. This should probably be reduced.

@0xcaff
Copy link
Member Author

0xcaff commented Aug 15, 2018

@Mcat12 What are your thoughts on making ffmpeg a runtime dependency? Instead of using the ffmpeg library, we could call the cli in a process to do transcoding and output the transcoded result to STDOUT.

Pros

  1. Builds would be faster and smaller.
  2. Less code. There's about 150 lines of complex ffmpeg code to do transcoding which transcode an entire file given a path. Making transcoding yield execution after some bytes are output requires using generators or a manually crafted state machine. The yielding transcoder will be more (complex) code.
  3. It's easy to have a yielding transcoder with the CLI, just stop reading and backpressure will cause the process on the other end of the pipe to stop writing.

Cons

  1. Runtime dependency. It's not that hard of a runtime dependency to have though. It is in the package manager for many distros and could be easily bundled in a docker container.

@AzureMarker
Copy link
Member

If it's as simple as apt install ffmpeg or equivalent, then that's good. Just make sure we can handle all possible errors and that it's not fragile

0xcaff added a commit that referenced this issue Aug 17, 2018
It looks like the streaming solution won't work for reasons outlined in
#9. Switched to just transcoding the file before starting to stream it
using the ffmpeg command line.

It compiles but there's alot of things to do yet.
@0xcaff 0xcaff moved this from Backlog to In Progress in Release Aug 18, 2018
@0xcaff 0xcaff moved this from In Progress to In Review in Release Aug 18, 2018
@0xcaff
Copy link
Member Author

0xcaff commented Aug 27, 2018

Just tested it playing a v0 mp3 transcoded from FLAC on the Google Home and it works!

@0xcaff 0xcaff moved this from In Review to Done in Release Sep 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Release
  
Done
Development

No branches or pull requests

2 participants