Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSD50K Speech Model Fine-tuning Tutorial #201

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

FlorentMeyer
Copy link

@FlorentMeyer FlorentMeyer commented Oct 22, 2022

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Add FSD50K Speech Model Fine-tuning Tutorial.

PR review

Did you have fun?

A lot 🙃

@codecov
Copy link

codecov bot commented Oct 22, 2022

Codecov Report

Merging #201 (9794cfc) into main (89b94ba) will not change coverage.
Report is 1 commits behind head on main.
The diff coverage is n/a.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #201   +/-   ##
===================================
  Coverage    73%    73%           
===================================
  Files         2      2           
  Lines       382    382           
===================================
  Hits        280    280           
  Misses      102    102           

@rohitgr7
Copy link
Contributor

hey @FlorentMeyer, mind check the file you uploaded, looks like it's too big and there might be some redundant stuff here. Might clean it up?

@Borda
Copy link
Member

Borda commented Oct 25, 2022

not sure what happen but GH does not want to show me the diff :/

@FlorentMeyer
Copy link
Author

FlorentMeyer commented Oct 26, 2022

Good evening,

I should mention that the code in the converted notebook was exactly the same as in this Colab notebook (having removed the !pip installs). I also kept the output, but reading other people's examples I suppose that the outputs printed inside the docs are the ones obtained by running the .py converted notebooks on your side.

My last commit therefore makes these changes to the linked Colab notebook:

  • remove all cells outputs
  • remove conditions on bash instructions (a single bash command inside an if was causing a syntax error due to the absence of Python code)
  • remove %% magic
  • comment out the tensorboard cell (which was responsible for creating such a large file, I am sorry I hadn't checked it before)

Changes to the .yaml file:

  • add gdown as a requirement
  • remove brackets around my name (didn't know they were special characters)

I'm just not sure whether the Pandas dataframes with the audio players will get rendered.

# ## Compute metrics

# %% id="zlTooqqp8FWk"
mAP_micro = average_precision_score(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I wrote the code, torchmetrics.functional.average_precision's target took "integer labels" therefore not accepting multi-hot labels. Just let me check whether this was fixed and if I get the same results as with scikit-learn!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the new implementations of multilabel_average_precision give the same results as scikit-learn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use that :)

# ## Compute metrics

# %% id="zlTooqqp8FWk"
mAP_micro = average_precision_score(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it looks like you have true values for preds, I'd recommend using test_step instead to show the metrics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Something like this?

With on_step=False, on_epoch=True to only log the end of the epoch according to https://pytorch-lightning.readthedocs.io/en/stable/extensions/logging.html#logging-from-a-lightningmodule:

The above config for validation applies for test hooks as well.

Suggested change
mAP_micro = average_precision_score(
# In __init__:
self.mAP = torchmetrics.classification.MultilabelAveragePrecision()
# In test_step:
self.mAP(preds, y)
self.log('mAP', self.mAP)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the class version of torchmetrics should be prefered to functional I'd say?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.. it's fine.. let's use functional metrics here since you already have all the targets and predictions.
modular metrics are useful, when you are aggregating the metrics, let's say on step-level

@akihironitta akihironitta added the Example Example / Demo / Tutorial label Oct 27, 2022
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
@Borda Borda changed the title Upload whole project. FSD50K Speech Model Fine-tuning Tutorial Nov 4, 2022
@FlorentMeyer
Copy link
Author

FlorentMeyer commented Nov 7, 2022

I see there are still problems with:

  1. the git+<my_repo> in requirements.txt
  2. the building of the docs saying some cells are missing IDs even though I used jupytext as requested

@FlorentMeyer
Copy link
Author

OK I also saw that there were bizarre things happening in the notebook, looks like the pre-commit hooks are moving stuff around causing duplication every time I pull them into my own code before being able to push again (example) and it's easy to miss things when reading a notebook as a .py file.

Anyway I read the whole file carefully and this should be fixed now. Also all cells have an ID so I'm not sure where the "cells are missing IDs" error comes from :/

@FlorentMeyer FlorentMeyer removed the request for review from kaushikb11 November 12, 2022 09:53
@FlorentMeyer
Copy link
Author

Small up!

# name: python3
# ---

# %% [markdown] id="CI0JECKA9AnY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove all the ID

@Borda Borda assigned ethanwharris and unassigned rohitgr7 Dec 16, 2022
@mergify mergify bot requested a review from Borda April 6, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Example Example / Demo / Tutorial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants