New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AudioFolder packaged loader #4530
Merged
polinaeterna
merged 81 commits into
huggingface:main
from
polinaeterna:add-audio-folder-new
Aug 22, 2022
Merged
Changes from all commits
Commits
Show all changes
81 commits
Select commit
Hold shift + click to select a range
cc87b0e
add audiofolder loader (almost identical to imagefolder except for in…
polinaeterna 0adcd56
check codestyle
polinaeterna e46eecb
add tests
polinaeterna 7cc4ab9
remove unused imports
polinaeterna a648530
add dummy data
polinaeterna 5cbbad1
add instruction on how to obtain list of audio extensions
polinaeterna 53e9ce3
fix comment
polinaeterna 60760ea
add audiofolder dummy files in tests
polinaeterna e4bb688
Merge branch 'master' into add-audio-folder-new
polinaeterna d0b2592
check if two separate files fix test error (i guess not but just in c…
polinaeterna 15ca3cf
remove unused imports
polinaeterna 420dd2b
Revert "check if two separate files fix test error (i guess not but j…
polinaeterna 68b7f5a
add uppercased formats, modify test for zip archive (check that array…
polinaeterna 7c75e81
Merge branch 'huggingface:master' into add-audio-folder-new
polinaeterna eecc449
add contributors
polinaeterna a081364
Merge branch 'add-audio-folder-new' of github.com:polinaeterna/datase…
polinaeterna 93c6afa
Merge branch 'huggingface:master' into add-audio-folder-new
polinaeterna f5d9841
Merge branch 'add-audio-folder-new' of github.com:polinaeterna/datase…
polinaeterna 91afd92
Merge branch 'huggingface:master' into add-audio-folder-new
polinaeterna cce1ebf
Merge branch 'master' into add-audio-folder-new
polinaeterna 3c6c56a
add a generic loader
polinaeterna 6840eab
change name of get_patterns
polinaeterna ffa6d14
update autofolder, align imagefolder and audiofolder with it
polinaeterna aa2f246
align audiofolder
polinaeterna d27266c
move autofolder
polinaeterna 24a65fd
fix bug with incorrect itaration over archives (incorrect copypaste -_-)
polinaeterna f9ee90d
get back comment
polinaeterna c905c1b
patch autofolder for streaming manually
polinaeterna f0ddbef
Merge branch 'huggingface:main' into add-audio-folder-new
polinaeterna 6c7a1f9
check fro AutoFolder class specifically in patching, not its string n…
polinaeterna 9f9551c
Merge branch 'huggingface:main' into add-audio-folder-new
polinaeterna 96189c2
Merge branch 'add-audio-folder-new' of github.com:polinaeterna/datase…
polinaeterna 66d3877
pass missing use_auth_token for AutoFolder patching
polinaeterna ba9d059
fix docstrings
polinaeterna 1346488
Merge branch 'main' into add-audio-folder-new
polinaeterna a0b4093
align autofolder with the latest imagefolder implementation
polinaeterna 86fbb99
Merge branch 'main' into add-audio-folder-new
polinaeterna d1e4a64
update tests
polinaeterna b9eace0
add test for duplicate label col
polinaeterna 6a841df
copy test for dir names
polinaeterna eabece2
add tests for autofolder (+copied from imagefolder)
polinaeterna d250bfd
Merge branch 'add-audio-folder-new' of github.com:polinaeterna/datase…
polinaeterna edd4803
Merge branch 'huggingface:main' into add-audio-folder-new
polinaeterna aab4746
Merge branch 'add-audio-folder-new' of github.com:polinaeterna/datase…
polinaeterna 997a01b
add missed audio_file fixture
polinaeterna 56d35aa
add __name__ to audio/image features to avoid documentation building …
polinaeterna 42627ba
fix CI on windows
polinaeterna 76c319f
check for __name__ attr too when creating base_feature_name
polinaeterna dce047e
add documentation
polinaeterna 74474fd
make base_feature a private attr to be excluded from docs
polinaeterna 91c130b
fix docs
polinaeterna 0c33f73
fix comment (rename base_feature)
polinaeterna 7a8e384
Merge branch 'huggingface:main' into add-audio-folder-new
polinaeterna 75ac1f4
fix typos (from code review)
polinaeterna 3ab6136
fix typo (from code review)
polinaeterna 0b60893
remove boilerplate, make base feature builder's class arg instead of …
polinaeterna bc1fb3d
patch relative imports from parent folder too
polinaeterna b4c8a2d
Merge remote-tracking branch 'upstream/main' into add-audio-folder-new
polinaeterna 676e6f3
Merge remote-tracking branch 'upstream/main' into add-audio-folder-new
polinaeterna 724782e
remove self.config.label_name, use hardcoded 'label'
polinaeterna bfecab4
patch parents that inherit from DatasetBuilder, revert get_imports
polinaeterna 90dc043
rename autofolder -> folder_builder
polinaeterna 292a8c5
remove autofolder dir
polinaeterna 3e32181
remove axtending for streaming from tests, it should work without man…
polinaeterna fe80766
make base column name an abstract attr of FolderBuilder instead of co…
polinaeterna f74922c
Update src/datasets/streaming.py
polinaeterna 227ce04
rename FolderBuilder -> FolderBasedBuilder
polinaeterna 034b88c
set drop_labels to None by default for AudioFolder
polinaeterna 54c6cf2
remove dataclass decorator from audio/image folder configs as they do…
polinaeterna 7f6719b
remove ABC from FolderBasedBuilder as it does nothing
polinaeterna 748576b
update documentation
polinaeterna 615a839
fix docs
polinaeterna 02f8f57
SORRY another small fix in docs
polinaeterna fc41118
get back abc and dataclasses just because of the magical thinking ¯\_…
polinaeterna 9ee04ed
Revert "get back abc and dataclasses just because of the magical thin…
polinaeterna accb8cd
Merge remote-tracking branch 'upstream/main' into add-audio-folder-new
polinaeterna adccfd8
Merge remote-tracking branch 'upstream/main' into add-audio-folder-new
polinaeterna 6a79a5f
check if builder extending for streaming is not in datasets.builder m…
polinaeterna 189e98b
Merge branch 'add-audio-folder-new' of github.com:polinaeterna/datase…
polinaeterna 89e298c
fix linters
polinaeterna fbef2b0
add comment to the patching thing
polinaeterna File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
|
||
### Contributions | ||
|
||
Thanks to [@polinaeterna](https://github.com/polinaeterna), [@nateraw](https://github.com/nateraw), [@lhoestq](https://github.com/lhoestq) and [@mariosasko](https://github.com/mariosasko) for adding this dataset. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
from typing import List | ||
|
||
import datasets | ||
|
||
from ..folder_based_builder import folder_based_builder | ||
|
||
|
||
logger = datasets.utils.logging.get_logger(__name__) | ||
|
||
|
||
class AudioFolderConfig(folder_based_builder.FolderBasedBuilderConfig): | ||
"""Builder Config for AudioFolder.""" | ||
|
||
drop_labels: bool = None | ||
drop_metadata: bool = None | ||
|
||
|
||
class AudioFolder(folder_based_builder.FolderBasedBuilder): | ||
BASE_FEATURE = datasets.Audio() | ||
BASE_COLUMN_NAME = "audio" | ||
BUILDER_CONFIG_CLASS = AudioFolderConfig | ||
EXTENSIONS: List[str] # definition at the bottom of the script | ||
|
||
|
||
# Obtained with: | ||
# ``` | ||
# import soundfile as sf | ||
# | ||
# AUDIO_EXTENSIONS = [f".{format.lower()}" for format in sf.available_formats().keys()] | ||
# | ||
# # .mp3 is currently decoded via `torchaudio`, .opus decoding is supported if version of `libsndfile` >= 1.0.30: | ||
# AUDIO_EXTENSIONS.extend([".mp3", ".opus"]) | ||
# ``` | ||
# We intentionally do not run this code on launch because: | ||
# (1) Soundfile is an optional dependency, so importing it in global namespace is not allowed | ||
# (2) To ensure the list of supported extensions is deterministic | ||
AUDIO_EXTENSIONS = [ | ||
".aiff", | ||
".au", | ||
".avr", | ||
".caf", | ||
".flac", | ||
".htk", | ||
".svx", | ||
".mat4", | ||
".mat5", | ||
".mpc2k", | ||
".ogg", | ||
".paf", | ||
".pvf", | ||
".raw", | ||
".rf64", | ||
".sd2", | ||
".sds", | ||
".ircam", | ||
".voc", | ||
".w64", | ||
".wav", | ||
".nist", | ||
".wavex", | ||
".wve", | ||
".xi", | ||
".mp3", | ||
".opus", | ||
] | ||
AudioFolder.EXTENSIONS = AUDIO_EXTENSIONS |
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would put this section first, since this is the main use case anyway.