New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to extract labels from not unnotated waves ? #20
Comments
As described in this article, (p. 3, limitations section) the labelling method included in If you set
This is more of a research question / general methods problem, and I'm afraid I don't have the time to think about this just now. Having said that, I would be happy to try to answer more concrete questions that you might have either now or as you work on your project.
In brief, pykanto uses UMAP and HDBSCAN (see e.g. Sainburg et al., 2020) to conservatively find clusters that can then be manually reviewed using an interactive app. If you were to annotate a subset of your data in this way, you could then train a classifier to label the rest of the data. This is what NNs like tweetynet can do ( |
Finally,
Reason: see the docs for the |
Hi Nilo, first of all, thanks for you help and availability might I need some debrief in my journey :) Insightful to keep in mind "continuity" Vs more discrete sounds; I will split this thread and move discussion types questions into other threads. On aim to replicate tutorials, could you please clarify the following will automatically download datasets for storm-petrel or bengalese finch, as for great tit ?
My confusion is that I see the folder But the error refers to segmented data.
Instead:
will work.
https://nilomr.github.io/pykanto/_build/html/contents/basic-workflow.html |
Hi Luigi,
That is accurate. The package includes some sample data, some are already segmented (in this sense: DOCS), some aren't. This is intentional, both for testing purposes and so that users can see how to work with different types of datasets.
As to your question "_How to use the tutorials for preinstalled datasets, storm-petrel or bengalese finch?", please see my message above:
Hope that helps! |
Tasks
|
Hi! Now I figured it out. The missing steps was in the 3-rd hidden cell: Feedback: If useful, in my opinion you may want to explain that raw data can / must be segmented already in the first tutorial: It was not clear to me to understand what was going on, simply looking at the classes and parameters. When running Error was related to tqdm: In the official However, I was able to solve it by checking out install again with conda:
Conda said it was correctly installed, but (mysteriously) the issue was solved.
My 2 cents : I suspect tqdm may have previously installed via pip and may be useful to ensure packages are installed via conda, not pip. But not sure about this... Thanks again Nilo! |
Thanks, Luigi. Would you mind opening a separate issue about this |
I have a db that is annotated with starts and ends of samples in vocalizations,
and the units of samples are probably syllables - not single notes.
(They are bats, not birds).
Below some temptative answers, I'd love to hear your perspectives and guidance (I am MSc student):
--
Looking at:
It seems pykanto support automatic label annotation: the labels are None, only the intervals are known. For exploratory purpose, I tried :
but error was found:
for some reason, it is not looking at the folder "raw", differently than the example for "GREAT-TIT".
Considering that I have data of vocalizations with the following annotations:
Some possible approaches could be:
a. to compare the spectrograms of the syllables from the same emitter, and in same context, and label them accordingly. Expected results: classify similar syllables, constraint by emitter and context.
b. to project on UMAP, and manually label those groups of syllables appearing close.
In such an option, I wonder if I could project All of the vocalizations directly, for all emitters and contexts. Expected challenges: let me say with an example, if I say "apple" and "pinapple" when I am "hungry", and you also say "apple" and "pinapple" when you are "hungry", my two utterances may appear closer than yours if the emitter features weights more than context, or "apple" and "apple" may appear closer if the context weigths more than emitte features. How to address the issue and extract the utterance "apple" and "pinapple" as unique "vocabulary" for both ?
REading that pykanto may be somewhat similar to works of
vak
, can you explain how would it handle this problem and, if so, types of model / Neural Networks adopted to label syllables ?The text was updated successfully, but these errors were encountered: