Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more control to the speech to text workflow operation #5842

Open
wants to merge 2 commits into
base: r/15.x
Choose a base branch
from

Conversation

marwyg
Copy link
Member

@marwyg marwyg commented May 14, 2024

Fixes #5818

This PR introduces a straightforward audio check in the subtitle generation method. Previously, the whisper process would throw an exception and halt the workflow if no audio was present. With this update, tracks lacking audio will be simply skipped during subtitle generation, preventing workflow interruptions.

@KatrinIhler
Copy link
Member

This is not a blocker, but I think it'd be nice if the operation was marked as Skipped if no track contained audio, ergo no transcription took place. Stuff like this makes workflow debugging easier. But this would require for the checking to happen before calling createSubtitles(), and keeping track of if we do that for at least one track.

More thoughts: In most cases if more than one track has audio, that audio is probably identical for every track. Running transcription more than once is probably overkill and can cost precious time. I would love to see a configuration transcribe-first-only or something, for only transcribing the first track with audio that is found.

@marwyg
Copy link
Member Author

marwyg commented May 16, 2024

@KatrinIhler These are nice ideas. It's not done yet, but I added some more functionality. With this, you can define a "track selection strategy". Like "Only Presenter" or "Try Presenter First" and similar.

@marwyg marwyg marked this pull request as draft May 16, 2024 15:29
@marwyg marwyg force-pushed the bugfix/5818-stt-skip-tracks-with-no-audio branch from 7fe95a0 to b0be9ec Compare May 17, 2024 09:01
@marwyg marwyg marked this pull request as ready for review May 17, 2024 09:03
@marwyg
Copy link
Member Author

marwyg commented May 17, 2024

The audio check from earlier is still included. All tracks with no audio will be filtered out first.

I added 2 workflow configurations.
With limit-to-one you can set if max one transcription shall be generated. (default no, this is how it currently works)
With track-selection-strategy you can control which tracks shall be used for the transcription. (default strategy is 'everything', this is how it currently works)

Possible strategies:

  • presenter_or_nothing: only uses presenter tracks
  • presentation_or_nothing: only uses presentation tracks
  • try_presenter_first: looks for presenter tracks first. If there are no usable, try to transcribe the other tracks
  • try_presentation_first: looks for presentation tracks first. If there are no usable, try to transcribe the other tracks
  • everything: just transcribe everything (this is how it currently works and this is the default if nothing was set in the config)

If no subtitles were generated, a message will be logged and the workflow operation status is set to "skipped".

@marwyg marwyg changed the title Add audio check before subtitle generation Add more control to the speech to text workflow operation May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants