Add more control to the speech to text workflow operation #5842

marwyg · 2024-05-14T10:02:06Z

This PR introduces a straightforward audio check in the subtitle generation method. Previously, the whisper process would throw an exception and halt the workflow if no audio was present. With this update, tracks lacking audio will be simply skipped during subtitle generation, preventing workflow interruptions.

KatrinIhler · 2024-05-15T15:35:50Z

This is not a blocker, but I think it'd be nice if the operation was marked as Skipped if no track contained audio, ergo no transcription took place. Stuff like this makes workflow debugging easier. But this would require for the checking to happen before calling createSubtitles(), and keeping track of if we do that for at least one track.

More thoughts: In most cases if more than one track has audio, that audio is probably identical for every track. Running transcription more than once is probably overkill and can cost precious time. I would love to see a configuration transcribe-first-only or something, for only transcribing the first track with audio that is found.

marwyg · 2024-05-16T15:28:35Z

@KatrinIhler These are nice ideas. It's not done yet, but I added some more functionality. With this, you can define a "track selection strategy". Like "Only Presenter" or "Try Presenter First" and similar.

marwyg · 2024-05-17T09:21:25Z

The audio check from earlier is still included. All tracks with no audio will be filtered out first.

I added 2 workflow configurations.
With limit-to-one you can set if max one transcription shall be generated. (default no, this is how it currently works)
With track-selection-strategy you can control which tracks shall be used for the transcription. (default strategy is 'everything', this is how it currently works)

Possible strategies:

presenter_or_nothing: only uses presenter tracks
presentation_or_nothing: only uses presentation tracks
try_presenter_first: looks for presenter tracks first. If there are no usable, try to transcribe the other tracks
try_presentation_first: looks for presentation tracks first. If there are no usable, try to transcribe the other tracks
everything: just transcribe everything (this is how it currently works and this is the default if nothing was set in the config)

If no subtitles were generated, a message will be logged and the workflow operation status is set to "skipped".

marwyg marked this pull request as draft May 16, 2024 15:29

Add more control to the speech to text workflow

b0be9ec

marwyg force-pushed the bugfix/5818-stt-skip-tracks-with-no-audio branch from 7fe95a0 to b0be9ec Compare May 17, 2024 09:01

marwyg marked this pull request as ready for review May 17, 2024 09:03

marwyg changed the title ~~Add audio check before subtitle generation~~ Add more control to the speech to text workflow operation May 17, 2024

Add documentation and some comments + javadoc

b070ef4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more control to the speech to text workflow operation #5842

Add more control to the speech to text workflow operation #5842

marwyg commented May 14, 2024

KatrinIhler commented May 15, 2024

marwyg commented May 16, 2024

marwyg commented May 17, 2024 •

edited

Add more control to the speech to text workflow operation #5842

Are you sure you want to change the base?

Add more control to the speech to text workflow operation #5842

Conversation

marwyg commented May 14, 2024

KatrinIhler commented May 15, 2024

marwyg commented May 16, 2024

marwyg commented May 17, 2024 • edited

marwyg commented May 17, 2024 •

edited