-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transcription range fields to database and ingestion models, add … #221
Add transcription range fields to database and ingestion models, add … #221
Conversation
…validator for time duration, add range filter to ffmpeg audio split, update tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codewise, this all looks great. Would be curious as to how you found or contributing notes and all of that. Any places for improvement?
Implementation wise... did you have a chance to test this on a dev deployment? I think this would transcribe the portion selected but the web UI / session interface may fail because the video doesn't match up with the audio... the best way to do this interestingly may be to split the video by timestamp and then do all the normal downstream processing rather than split the audio by timestamp...
Are you on the call this Thursday? If so it might be good to talk about this feature then. / How to dev-deploy
Codecov Report
@@ Coverage Diff @@
## main #221 +/- ##
==========================================
+ Coverage 72.33% 72.40% +0.07%
==========================================
Files 64 64
Lines 3492 3541 +49
==========================================
+ Hits 2526 2564 +38
- Misses 966 977 +11
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I made it this far! If there's anything that I'm doing that makes it appear I missed the documentation, let me know and I might have some feedback.
No, that was the last thing I mentioned to @smai-f. Hopefully, we can get a test process going as we integrate the scraper.
My ideal was to do as you mentioned, and probably integrate with clip functionality. The main reason I didn't was because there seems to be a reliance on existing video hosting, and I thought that would be worth preserving. I didn't realize how the front end functionality worked (oops!), but I think there are some easy ways to preserve both the existing video hosting and accurate timestamps. The simplest might be to send the offset to the transcribe function to alter the timestamps in the transcript:
In addition, we can update the front end so it seeks to the appropriate place on page load: https://github.com/CouncilDataProject/cdp-frontend/blob/77944572b816ed07ef0dcffd772fc19b52aea12e/src/components/Details/EventVideo/EventVideo.tsx#L133
I should be available. Can you add me to the event? |
(None, "10"), | ||
("1", None), | ||
("0", "0"), | ||
("1", "0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests to see how ffmpeg handles 0
or negative duration output.
Beyond what Eva already mentioned, about the different video and audio stream lengths maybe being an issue, I think my only wish is that Thank you very much @chrisjkhan for this! |
…mpeg error logging
We decided at the meeting to do full video clipping and host the video. A combination of reusable functionality, relative low priority of original video hosting, simplicity of downstream issues, and overall UX. I'll change the functionality and the semantics of the whole issue.
You're welcome |
Sounds great. Just let me know / hit the "re-request review" button whenever. |
… host when limiting video to a range, updated mp4 conversion to allow a range, connected mp4 to clip functionality, updated tests and tried to make testing slightly more consistent, added Session ingestion verification to test out
I may have gone a little overboard with changes trying to slightly increase consistency across method signatures. The only thing I changed that might be controversial is making the hash calculation part of an already large task rather than keeping it separate. If you really want the hash to be performant, I would suggest modifying it to use CRC/MD5/SHA-1 and do a combination of the first x bytes/blocks and file size. I tested negative duration pairs ( |
…ix in cdp_will_host control structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm good with this. Thank you, Chris! I'll let Eva do the honors, though. She's a much better reviewer. (I also need to figure out how to start a review, make N comments for the one review, end the review...)
Thanks everybody! Here are some manual tests using Chris' latest commit in the Montana legislature instance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me. I have one remaining question before I click the approve and merge buttons.
Q: should we take the trim start time and add that to the session start time / event start times?
lets say the event starts at noon but you want to trim to 1 hour into the video. should the session and event start time say: "1pm"?
Also, thank you for such great work on this! Really nicely done and thank you @smai-f for helping test! |
Ah, I missed this comment. My guess is it should still be noon. There is a The only case I can imagine is a session "starting" at noon, the video recording starting at noon, but then actual activity starting at 1pm. My guess is, if this 1 hour of dead time is in the source data being scraped, |
First, great question Eva. 👏 I'm in line with Chris' thoughts; (I hope I'm understanding correctly) At the core, there are 2 fields aded to the model, the video start and end times (right?). At least to me, if I look at a model, and see that 'session start time' is noon, and 'video start time' is 1 PM, that would sound pretty natural. I may at first be slightly confused, but even without any external explanation, I think I would realize that this means, in this case, the video begins 1 hour into the session. If I'm looking at an example of what's probably the vast majority of the models - 'session start time' == 'video start time', that again wouldn't throw me off, especially if I have at that point, seen an example like above when they are different. |
Maybe there is still some confusion. |
@evamaxfield From my part consuming this PR in the Montana legislative scraper (and speaking the time/duration language Chris is pointing out of
I think the way Chris has it now makes the most sense from my narrow consuming perspective. Have the consumer decide whether or not they want to augment the session datetime to include the video start time offset or not. I can imagine reasons to support both behaviors related to the start time of the meeting aligning with a schedule written in another place. |
Thank you for pointing that out. I missed it! |
Thanks for all the discussion and clarification. Yes the only think I think should be changed prior to merge is adding the tiniest bit of documentation to the ingestion model that states the video time is relative to the session time. I would just put it in the class docstring as we still need to find a better overall project documentation structure haha |
@evamaxfield Added. Let me know how it looks. I followed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. Verbose is more than welcome. Clicking the merge button!
Link to Relevant Issue
This pull request resolves #211
Description of Changes
-Add optional transcription range fields to database model
-Add optional transcription range fields to ingestion model
-Add validator for time duration
-Add range filter to ffmpeg audio split (similar to clip functionality)
-Update tests for valid inputs