Allow `list_datasets` to include private datasets #4745

ola13 · 2022-07-26T10:16:08Z

I am working with a large collection of private datasets, it would be convenient for me to be able to list them.

I would envision extending the convention of using use_auth_token keyword argument to list_datasets function, then calling:

list_datasets(use_auth_token="my_token")

would return the list of all datasets I have permissions to view, including private ones. The only current alternative I see is to use the hub website to manually obtain the list of dataset names - this is in the context of BigScience where respective private spaces contain hundreds of datasets, so not very convenient to list manually.

The text was updated successfully, but these errors were encountered:

lhoestq · 2022-07-26T10:32:31Z

Thanks for opening this issue :)

If it can help, I think you can already use huggingface_hub to achieve this:

>>> from huggingface_hub import HfApi
>>> [ds_info.id for ds_info in HfApi().list_datasets(use_auth_token=token) if ds_info.private]
['bigscience/xxxx', 'bigscience-catalogue-data/xxxxxxx', ... ]

Though the latest versions of huggingface_hub that contain this feature are not available on python 3.6, so maybe we should first drop support for python 3.6 (see #4460) to update list_datasets in datasets as well (or we would have to copy/paste some huggingface_hub code)

ola13 · 2022-07-26T10:44:52Z

Great, thanks @lhoestq the workaround works! I think it would be intuitive to have the support directly in datasets but it makes sense to wait given that the workaround exists :)

julien-c · 2022-07-26T11:59:25Z

i also think that going forward we should replace more and more implementations inside datasets with the corresponding ones from huggingface_hub (same as we're doing in transformers)

mariosasko · 2023-07-25T15:01:48Z

datasets.list_datasets is now deprecated in favor of huggingface_hub.list_datasets (returns private datasets when token is present), so I'm closing this issue.

ola13 added the enhancement New feature or request label Jul 26, 2022

ola13 assigned lhoestq Jul 26, 2022

lhoestq removed their assignment Jul 26, 2022

mariosasko closed this as completed Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `list_datasets` to include private datasets #4745

Allow `list_datasets` to include private datasets #4745

ola13 commented Jul 26, 2022

lhoestq commented Jul 26, 2022 •

edited

ola13 commented Jul 26, 2022

julien-c commented Jul 26, 2022

mariosasko commented Jul 25, 2023

Allow list_datasets to include private datasets #4745

Allow list_datasets to include private datasets #4745

Comments

ola13 commented Jul 26, 2022

lhoestq commented Jul 26, 2022 • edited

ola13 commented Jul 26, 2022

julien-c commented Jul 26, 2022

mariosasko commented Jul 25, 2023

Allow `list_datasets` to include private datasets #4745

Allow `list_datasets` to include private datasets #4745

lhoestq commented Jul 26, 2022 •

edited