feature/speaker-classifier-apply-function #197

evamaxfield · 2022-06-29T23:59:34Z

Link to Relevant Issue

WIP #131

Description of Changes

Include a description of the proposed changes.

Adds the function to annotate a transcript with a trained speakerbox model!

I am already using this and applying the highest accuracy model for Seattle to all of 2021-01-01 to 2022-01-01 data:

Application function: https://github.com/JacksonMaxfield/phd-infrastructures/blob/main/speakerbox/manager.py#L532
Log for just 2021-01-01 to 2021-02-01: https://github.com/JacksonMaxfield/phd-infrastructures/runs/7122165087?check_suite_focus=true

isaacna · 2022-07-01T19:52:49Z

cdp_backend/annotation/speaker_labels.py

+    transcript: Union[str, Path, Transcript],
+    audio: Union[str, Path, AudioSegment],
+    model: str = DEFAULT_MODEL,
+    min_intra_sentence_chunk_duration: float = 0.5,


Do you think 0.5 is enough for a minimum? For certain things like roundtable one-word answers like 'yes/no' I could see it taking slightly less than half a second

I think this is one of those things of... "is it valuable to the downstream analysis"

anything less than 0.5 seconds to me seems non-valuable to tag.

Though I am trying too find a better justification for this. Part of the reason is that I only trained on data that ranged from 0.5 - 2 seconds. And that was because anything less than 0.5 seconds worried me for "how much data to predict with" the smaller the clip the less information to use.

isaacna · 2022-07-01T19:52:53Z

cdp_backend/annotation/speaker_labels.py

+    model: str = DEFAULT_MODEL,
+    min_intra_sentence_chunk_duration: float = 0.5,
+    max_intra_sentence_chunk_duration: float = 2.0,
+    min_sentence_mean_confidence: float = 0.985,


Is this based on the confidence you've been seeing with the existing model you trained?

Hah. Great question and one that I am struggling with. Right now, this is 0.985 is based off: "i tried a bunch of different thresholds and this one seemed good" but I would love to know if there is a formula from confidence -> p-value??

cdp_backend/annotation/speaker_labels.py

setup.py

isaacna

Looks good to me! Just had a few questions about the some of the default values we picked for sentence length and confidence

tohuynh

annotate looks good to me. Just have a couple of questions.

tohuynh · 2022-07-02T18:08:19Z

cdp_backend/annotation/speaker_labels.py

+        The maximum duration for a sentences audio to split to. This should match
+        whatever was used during model training
+        (i.e. trained on 2 second audio chunks, apply on 2 second audio chunks)
+        Default: 2 seconds


What's the reasoning behind this?

Originally couldn't fit more than 2 seconds of audio into GPU during training.

cdp_backend/annotation/speaker_labels.py

codecov · 2022-07-06T21:16:51Z

Codecov Report

Merging #197 (9930149) into main (1e3b7cb) will decrease coverage by 2.14%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
- Coverage   93.38%   91.24%   -2.15%     
==========================================
  Files          51       53       +2     
  Lines        2677     2740      +63     
==========================================
  Hits         2500     2500              
- Misses        177      240      +63

Impacted Files	Coverage Δ
cdp_backend/annotation/__init__.py	`0.00% <0.00%> (ø)`
cdp_backend/annotation/speaker_labels.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e3b7cb...9930149. Read the comment docs.

JacksonMaxfield added 10 commits February 5, 2022 18:23

Add func for adding speaker annos to transcript

9fc284c

Merge branch 'main' into feature/apply-speaker-classifier

9df3bf3

Try new model

97f7cf2

Change prints to log and generate a UUID for temp file

cf0e78c

Clean up log debug message

d2520e9

Remove tqdm from apply

07d1e91

Change end stats of annotation to debug

5800477

Use log info for testing

41476a5

Add back tqdm to sentence annotation, debug individual sentence threshs

d894d44

More docs

9330fb1

evamaxfield added enhancement New feature or request event gather pipeline A feature or bugfix relating to event processing labels Jun 29, 2022

evamaxfield requested review from isaacna, tohuynh and dphoria June 29, 2022 23:59

evamaxfield self-assigned this Jun 29, 2022

Lint and format and move deps

96c3658

isaacna reviewed Jul 1, 2022

View reviewed changes

cdp_backend/annotation/speaker_labels.py Outdated Show resolved Hide resolved

isaacna reviewed Jul 1, 2022

View reviewed changes

setup.py Outdated Show resolved Hide resolved

isaacna approved these changes Jul 1, 2022

View reviewed changes

tohuynh reviewed Jul 2, 2022

View reviewed changes

JacksonMaxfield added 2 commits July 6, 2022 14:05

Merge branch 'main' into feature/apply-speaker-classifier

3148fa6

Resolve comments

9930149

evamaxfield merged commit f4d40c6 into main Jul 19, 2022

evamaxfield deleted the feature/apply-speaker-classifier branch July 19, 2022 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/speaker-classifier-apply-function #197

feature/speaker-classifier-apply-function #197

evamaxfield commented Jun 29, 2022

isaacna Jul 1, 2022

evamaxfield Jul 5, 2022

isaacna Jul 1, 2022

evamaxfield Jul 5, 2022

isaacna left a comment

tohuynh left a comment

tohuynh Jul 2, 2022

evamaxfield Jul 5, 2022

codecov bot commented Jul 6, 2022 •

edited

feature/speaker-classifier-apply-function #197

feature/speaker-classifier-apply-function #197

Conversation

evamaxfield commented Jun 29, 2022

Link to Relevant Issue

Description of Changes

isaacna Jul 1, 2022

Choose a reason for hiding this comment

evamaxfield Jul 5, 2022

Choose a reason for hiding this comment

isaacna Jul 1, 2022

Choose a reason for hiding this comment

evamaxfield Jul 5, 2022

Choose a reason for hiding this comment

isaacna left a comment

Choose a reason for hiding this comment

tohuynh left a comment

Choose a reason for hiding this comment

tohuynh Jul 2, 2022

Choose a reason for hiding this comment

evamaxfield Jul 5, 2022

Choose a reason for hiding this comment

codecov bot commented Jul 6, 2022 • edited

Codecov Report

codecov bot commented Jul 6, 2022 •

edited