Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/filter-out-invalid-captions #214

Merged
merged 9 commits into from Sep 27, 2022

Conversation

dphoria
Copy link
Contributor

@dphoria dphoria commented Sep 23, 2022

Link to Relevant Issue

This pull request resolves #213

Description of Changes

In create_event_gather_flow(), in addition to calling resource_exists() for session.caption_uri, further validate the caption file by comparing its length against the video at session.video_uri. Reject, i.e. just do speech-to-text, if their lengths differ by more than 20%.

@dphoria dphoria added the enhancement New feature or request label Sep 23, 2022
@dphoria dphoria marked this pull request as draft September 23, 2022 06:40
@dphoria dphoria marked this pull request as ready for review September 24, 2022 17:35
Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for getting to this!!!

One minor question / weird cast to bytes question but otherwise totally good to merge!

cdp_backend/utils/file_utils.py Show resolved Hide resolved
cdp_backend/tests/utils/test_file_utils.py Outdated Show resolved Hide resolved
@dphoria dphoria merged commit 1f66f12 into CouncilDataProject:main Sep 27, 2022
@dphoria dphoria deleted the feature/check-caption-files branch September 27, 2022 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter out bad caption files
2 participants