Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hfh 0.10 implicit auth #5031

Merged
merged 9 commits into from Sep 30, 2022
Merged

Support hfh 0.10 implicit auth #5031

merged 9 commits into from Sep 30, 2022

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Sep 27, 2022

In huggingface-hub 0.10 the token parameter is deprecated for dataset_info and list_repo_files in favor of use_auth_token.
Moreover if use_auth_token=None then the user's token is used implicitly.

I took those two changes into account

Close #4990

TODO:

  • fix tests

We should wait hfh 0.10 to be relased first to make sure it works correctly before merging

@lhoestq lhoestq marked this pull request as draft September 27, 2022 18:39
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 27, 2022

The documentation is not available anymore as the PR was closed or merged.

@Wauplin
Copy link
Contributor

Wauplin commented Sep 28, 2022

@lhoestq it is now released so you can move forward with it :)

Comment on lines 113 to 131
"""
Get info on one specific dataset on huggingface.co.
Dataset can be private if you pass an acceptable token.
Args:
hf_api (`huggingface_hub.HfApi`): Hub client
repo_id (`str`):
A namespace (user or an organization) and a repo name separated
by a `/`.
revision (`str`, *optional*):
The revision of the dataset repository from which to get the
information.
timeout (`float`, *optional*):
Whether to set a timeout for the request to the Hub.
use_auth_token (`bool` or `str`, *optional*):
Whether to use the `auth_token` provided from the
`huggingface_hub` cli. If not logged in, a valid `auth_token`
can be passed in as a string.
Returns:
[`hf_api.DatasetInfo`]: The dataset repository information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a copy-paste from the huggingface_hub docstring ? I would mention it and refer to https://huggingface.co/docs/huggingface_hub/v0.10.0/en/package_reference/hf_api#huggingface_hub.HfApi.dataset_info in the description to that a dataset user knows.

Comment on lines 222 to +225
def get_datasets_user_agent(user_agent: Optional[Union[str, dict]] = None) -> str:
ua = f"datasets/{__version__}; python/{config.PY_VERSION}"
ua = f"datasets/{__version__}"
ua += f"; python/{config.PY_VERSION}"
ua += f"; huggingface_hub/{huggingface_hub.__version__}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to hear if you have ideas on how to make this easy to configure. We have the same logic in huggingface_hub and transformers as well but nothing very convenient.

(not a suggestion to change something here, more as a general question)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's only a few lines and pretty easy to modify it, I don't think we can make something significantly better than that

@lhoestq lhoestq marked this pull request as ready for review September 29, 2022 12:31
@lhoestq
Copy link
Member Author

lhoestq commented Sep 30, 2022

I took your comments into account @Wauplin :)
I also bumped the requirement to 0.2.0 because we're using set_access_token

cc @albertvillanova WDYT ? I edited the CI job to also check for our minimum supported version of hfh at the same time as the minimum pyarrow version

@Wauplin
Copy link
Contributor

Wauplin commented Sep 30, 2022

@lhoestq great, thanks ! :)

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tweaks. Just a suggestion and a comment below.

Comment on lines +56 to +66
def skip_if_metric_requires_transformers(test_case):
@wraps(test_case)
def wrapper(self, metric_name):
if not _has_transformers and metric_name in REQUIRE_TRANSFORMERS:
self.skipTest('"test requires transformers"')
else:
test_case(self, metric_name)

return wrapper


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you added this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an env without transformers and I had failing tests x) so I did the same as for other optional deps: excluding the tests when it's not installed

@@ -1112,14 +1110,11 @@ def dataset_module_factory(
_raise_if_offline_mode_is_enabled()
hf_api = HfApi(config.HF_ENDPOINT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this line because hf_api is no longer used in this method?

hf_api = HfApi(config.HF_ENDPOINT)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still used in hf_api_dataset_info(hf_api, ...) a few lines later ;)

@lhoestq lhoestq merged commit 365884d into main Sep 30, 2022
@lhoestq lhoestq deleted the support-hfh-0.10-implicit-auth branch September 30, 2022 09:16
lhoestq added a commit that referenced this pull request Oct 5, 2022
* support hfh 0.10 implicit auth

* update tests

* Bump minimum hfh to 0.2.0 and test minimum version

* style

* fix test

* fix tests

* again

* lucain's comment

* fix ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"no-token" is passed to huggingface_hub when token is None
4 participants