Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow list_datasets to include private datasets #4745

Closed
ola13 opened this issue Jul 26, 2022 · 4 comments
Closed

Allow list_datasets to include private datasets #4745

ola13 opened this issue Jul 26, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@ola13
Copy link

ola13 commented Jul 26, 2022

I am working with a large collection of private datasets, it would be convenient for me to be able to list them.

I would envision extending the convention of using use_auth_token keyword argument to list_datasets function, then calling:

list_datasets(use_auth_token="my_token")

would return the list of all datasets I have permissions to view, including private ones. The only current alternative I see is to use the hub website to manually obtain the list of dataset names - this is in the context of BigScience where respective private spaces contain hundreds of datasets, so not very convenient to list manually.

@ola13 ola13 added the enhancement New feature or request label Jul 26, 2022
@lhoestq
Copy link
Member

lhoestq commented Jul 26, 2022

Thanks for opening this issue :)

If it can help, I think you can already use huggingface_hub to achieve this:

>>> from huggingface_hub import HfApi
>>> [ds_info.id for ds_info in HfApi().list_datasets(use_auth_token=token) if ds_info.private]
['bigscience/xxxx', 'bigscience-catalogue-data/xxxxxxx', ... ]

Though the latest versions of huggingface_hub that contain this feature are not available on python 3.6, so maybe we should first drop support for python 3.6 (see #4460) to update list_datasets in datasets as well (or we would have to copy/paste some huggingface_hub code)

@ola13
Copy link
Author

ola13 commented Jul 26, 2022

Great, thanks @lhoestq the workaround works! I think it would be intuitive to have the support directly in datasets but it makes sense to wait given that the workaround exists :)

@lhoestq lhoestq removed their assignment Jul 26, 2022
@julien-c
Copy link
Member

i also think that going forward we should replace more and more implementations inside datasets with the corresponding ones from huggingface_hub (same as we're doing in transformers)

@mariosasko
Copy link
Collaborator

datasets.list_datasets is now deprecated in favor of huggingface_hub.list_datasets (returns private datasets when token is present), so I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants