Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Docs for creating an audio dataset (#4872)
* 馃摑 add docs for creating audio dataset * 馃枍 small edits, encourage TAR archives more * 馃枍 apply polina feedbacks * audiofolder and metadata first * oops metadata first also in audio load * replace vivos with librivox indonesia, describe streaming in more detail * taking over the PR * check if i can push to other's fork don't look at this * git back vivos as main example, simplify instructions. add librivox-indonesia as an advanced example * Apply some suggestions from code review Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * Update docs/source/audio_dataset_repo.mdx Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> * fix something i don't remember what, integrate changes from #4925 * integrate #4952 to image docs too * rename audio and image datasets guides consistently (to audio/image_dataset.mdx) * remove outdated doc * fix audio guide name * fix link + minor changes Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: polinaeterna <polina@huggingface.co>
- Loading branch information
733e499
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show benchmarks
PyArrow==6.0.0
Show updated benchmarks!
Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!
Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json