-
Notifications
You must be signed in to change notification settings - Fork 1
Implement Transcoding #9
Comments
It looks like transcoding is going to be a bit more difficult than I thought. Here's the problem. Given an offset and a length, get There are two parameters to a cache. Where to store the cached values and where how long to keep them in the cache for. The cached values should be stored in memory. It is cheap to populate the cache (~1m to convert a 3m FLAC file to mp3 on an old machine) and storing the cache in memory, allows for handling evictions cheaply. How long an item stays in the cache should be bounded both by time and size. The item should stay in the cache for 1.5x the duration of the audio since the time it was last accessed. Need to think about the size metric more. Here's a library for implementing caching. |
FFmpeg is the state of the art, but how to use their code isn't very discoverable due to lack of documentation or any sort of constraints in the type system. I've created a binary which can decode an audio and encode it into a CBR mp3. Next up is designing the integration between this code and the endpoints. |
Primitive eager transcoding of a audio file has been implemented. This was both difficult and rewarding to implement. There are a couple of memory management kinks which should be worked out during the next step. The ffmpeg create is capable of being statically linked. This dependency really slows down compilation time. See #9
Unfortunately, the build is broken because of ffmpeg. forte-music/core-build#8 |
Statically linking ffmpeg is kinda difficult. Using the automated configuration works in the docker image, but brings in tons of un-needed dependencies. https://circleci.com/gh/forte-music/core/376 It doesn't work on my local machine and fails silently because rust-lang/pkg-config-rs#68 The two options are:
|
The best bet is to probably compile our own version of ffmpeg. It looks like ffmpeg feature flags can be accessed through cargo and only needed dependencies are statically linked in. The other option doesn't give us enough control. |
ffmpeg is mostly statically linked at this point. A couple dependencies aren't linked though. The build should be updated to fail in this case. The debug binary is a 243mb file. This should probably be reduced. |
@Mcat12 What are your thoughts on making ffmpeg a runtime dependency? Instead of using the ffmpeg library, we could call the cli in a process to do transcoding and output the transcoded result to STDOUT. Pros
Cons
|
If it's as simple as |
It looks like the streaming solution won't work for reasons outlined in #9. Switched to just transcoding the file before starting to stream it using the ffmpeg command line. It compiles but there's alot of things to do yet.
Just tested it playing a v0 mp3 transcoded from FLAC on the Google Home and it works! |
Overview
There needs to be something which takes a file on the disk and makes transcoded versions available at certain URLs. There should be transcoded versions with a few formats (AAC, MP3) and a few constant bitrates (320, 190).
Resources
Using FFMpeg Binary
The way koel handles transcoding is by calling the ffmpeg binary telling it to convert to a format and output the result to STDOUT. This works, but this adds a runtime dependency on ffmpeg. It might be a worthy tradeoff considering how complicated using the ffmpeg library is. Using things like ffmpeg-static to include or download a version of ffmpeg or shipping a docker/appimage/flatpak could make this easy.
Upon further investigation, it seems that installing ffmpeg on fedora is non-trivial. It is available through RPM fusion.
Also, upon further investigation, it seems that transcoding to stdout and a file are two different things. Files have random access, while pipes don't. Practically, this means that the duration doesn't show up in the audio bar at runtime and the file is played as an indefinite stream. This is a problem in koel (koel/koel#306 (comment)).
Pre-Transcoding Files
For cases where the transcoded files are read very often, it makes sense to transcode the files as soon as they are uploaded. Here, we don't mind trading off space for CPU time. We can make this configurable later with some caching parameters.
Formats
We want to transcode to formats which are one of two things.
We'd like to eventually support the following devices. Here are the formats they support.
Google Assistant (https://developers.google.com/actions/assistant/responses#media_responses)
Says it will accept a properly formatted MP3. Deezer also says it has been able to somehow stream lossless flac to Google Assistant devices somehow.
https://support.deezer.com/hc/en-gb/articles/115003803629-Google-Home
Amazon Alexa (https://developer.amazon.com/docs/alexa-voice-service/recommended-media-support.html)
Hmm, it says only up to 256kbps mp3 and AAC are supported. This is probably a constraint placed by low end alexa devices.
Android (https://developer.android.com/guide/topics/media/media-formats#audio-codecs)
iOS (https://stackoverflow.com/questions/1761460/supported-audio-file-formats-in-iphone) (https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/MultimediaPG/UsingAudio/UsingAudio.html)
Based on these constraints, the following codecs should be supported:
We will consider adding some lower bandwidth options when building the mobile applications.
Needless Conversion
Conversion shouldn't be allowed in some cases like going between two lossless formats (AACv5 -> MP3v0), but there is no objective metric for whether a conversion should happen or not. For now, we'll let the user choose what format they want to stream their music at runtime and decide in constrained environments like the Google Assistant.
TODO
Test RangeStream(difficult to test, seems to work, low churn, should probably be moved to a library)Test TranscodingFileHandler(low risk, low reward)Test Transcoder(can't really test the actual transcoding without a decoder, should probably test because of runtime borrow checking, can't easily replace the transcoding future)The text was updated successfully, but these errors were encountered: