Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/speaker-classifier-apply-function #197

Merged
merged 13 commits into from Jul 19, 2022

Conversation

evamaxfield
Copy link
Member

Link to Relevant Issue

WIP #131

Description of Changes

Include a description of the proposed changes.

Adds the function to annotate a transcript with a trained speakerbox model!

I am already using this and applying the highest accuracy model for Seattle to all of 2021-01-01 to 2022-01-01 data:

@evamaxfield evamaxfield added enhancement New feature or request event gather pipeline A feature or bugfix relating to event processing labels Jun 29, 2022
@evamaxfield evamaxfield self-assigned this Jun 29, 2022
transcript: Union[str, Path, Transcript],
audio: Union[str, Path, AudioSegment],
model: str = DEFAULT_MODEL,
min_intra_sentence_chunk_duration: float = 0.5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think 0.5 is enough for a minimum? For certain things like roundtable one-word answers like 'yes/no' I could see it taking slightly less than half a second

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one of those things of... "is it valuable to the downstream analysis"

anything less than 0.5 seconds to me seems non-valuable to tag.

Though I am trying too find a better justification for this. Part of the reason is that I only trained on data that ranged from 0.5 - 2 seconds. And that was because anything less than 0.5 seconds worried me for "how much data to predict with" the smaller the clip the less information to use.

model: str = DEFAULT_MODEL,
min_intra_sentence_chunk_duration: float = 0.5,
max_intra_sentence_chunk_duration: float = 2.0,
min_sentence_mean_confidence: float = 0.985,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this based on the confidence you've been seeing with the existing model you trained?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah. Great question and one that I am struggling with. Right now, this is 0.985 is based off: "i tried a bunch of different thresholds and this one seemed good" but I would love to know if there is a formula from confidence -> p-value??

setup.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@isaacna isaacna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just had a few questions about the some of the default values we picked for sentence length and confidence

Copy link
Collaborator

@tohuynh tohuynh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotate looks good to me. Just have a couple of questions.

Comment on lines +53 to +56
The maximum duration for a sentences audio to split to. This should match
whatever was used during model training
(i.e. trained on 2 second audio chunks, apply on 2 second audio chunks)
Default: 2 seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally couldn't fit more than 2 seconds of audio into GPU during training.

cdp_backend/annotation/speaker_labels.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 6, 2022

Codecov Report

Merging #197 (9930149) into main (1e3b7cb) will decrease coverage by 2.14%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
- Coverage   93.38%   91.24%   -2.15%     
==========================================
  Files          51       53       +2     
  Lines        2677     2740      +63     
==========================================
  Hits         2500     2500              
- Misses        177      240      +63     
Impacted Files Coverage Δ
cdp_backend/annotation/__init__.py 0.00% <0.00%> (ø)
cdp_backend/annotation/speaker_labels.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e3b7cb...9930149. Read the comment docs.

@evamaxfield evamaxfield merged commit f4d40c6 into main Jul 19, 2022
@evamaxfield evamaxfield deleted the feature/apply-speaker-classifier branch July 19, 2022 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request event gather pipeline A feature or bugfix relating to event processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants