Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test-datasets CI job #4952

Closed
wants to merge 2 commits into from
Closed

Add test-datasets CI job #4952

wants to merge 2 commits into from

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Sep 8, 2022

To avoid having too many conflicts in the datasets and metrics dependencies I split the CI into test and test-catalog

test does the test of the core of the datasets lib, while test-catalog tests the datasets scripts and metrics scripts

This also makes pip install -e .[dev] much smaller for developers

WDYT @albertvillanova ?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 8, 2022

The documentation is not available anymore as the PR was closed or merged.

@lhoestq lhoestq marked this pull request as ready for review September 9, 2022 16:17
@lhoestq
Copy link
Member Author

lhoestq commented Sep 16, 2022

Closing this one since the dataset scripts will be removed in #4974

@lhoestq lhoestq closed this Sep 16, 2022
polinaeterna added a commit to stevhliu/datasets that referenced this pull request Sep 20, 2022
lhoestq added a commit that referenced this pull request Sep 21, 2022
* 📝 add docs for creating audio dataset

* 🖍 small edits, encourage TAR archives more

* 🖍 apply polina feedbacks

* audiofolder and metadata first

* oops metadata first also in audio load

* replace vivos with librivox indonesia, describe streaming in more detail

* taking over the PR

* check if i can push to other's fork don't look at this

* git back vivos as main example, simplify instructions. add librivox-indonesia as an advanced example

* Apply some suggestions from code review

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* Update docs/source/audio_dataset_repo.mdx

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

* fix something i don't remember what, integrate changes from #4925

* integrate #4952 to image docs too

* rename audio and image datasets guides consistently (to audio/image_dataset.mdx)

* remove outdated doc

* fix audio guide name

* fix link + minor changes

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: polinaeterna <polina@huggingface.co>
@albertvillanova albertvillanova deleted the split-ci branch September 24, 2023 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants