Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when pushing to the CI hub #5390

Closed
severo opened this issue Dec 23, 2022 · 5 comments
Closed

Error when pushing to the CI hub #5390

severo opened this issue Dec 23, 2022 · 5 comments

Comments

@severo
Copy link
Contributor

severo commented Dec 23, 2022

Describe the bug

Note that it's a special case where the Hub URL is "https://hub-ci.huggingface.co", which does not appear if we do the same on the Hub (https://huggingface.co).

The call to dataset.push_to_hub( fails:

Pushing dataset shards to the dataset hub: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.93s/it]
Traceback (most recent call last):
  File "reproduce_hubci.py", line 16, in <module>
    dataset.push_to_hub(repo_id=repo_id, private=False, token=USER_TOKEN, embed_external_files=True)
  File "/home/slesage/hf/datasets/src/datasets/arrow_dataset.py", line 5025, in push_to_hub
    HfApi(endpoint=config.HF_ENDPOINT).upload_file(
  File "/home/slesage/.pyenv/versions/datasets/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1346, in upload_file
    raise err
  File "/home/slesage/.pyenv/versions/datasets/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 1337, in upload_file
    r.raise_for_status()
  File "/home/slesage/.pyenv/versions/datasets/lib/python3.8/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://hub-ci.huggingface.co/api/datasets/__DUMMY_DATASETS_SERVER_USER__/bug-16718047265472/upload/main/README.md

Steps to reproduce the bug

# reproduce.py
from datasets import Dataset
import time

USER = "__DUMMY_DATASETS_SERVER_USER__"
USER_TOKEN = "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"
dataset = Dataset.from_dict({"a": [1, 2, 3]})
repo_id = f"{USER}/bug-{int(time.time() * 10e3)}"
dataset.push_to_hub(repo_id=repo_id, private=False, token=USER_TOKEN, embed_external_files=True)
$ HF_ENDPOINT="https://hub-ci.huggingface.co" python reproduce.py

Expected behavior

No error and the dataset should be uploaded to the Hub with the README file (which generates the error).

Environment info

  • datasets version: 2.8.0
  • Platform: Linux-5.15.0-1026-aws-x86_64-with-glibc2.35
  • Python version: 3.9.15
  • PyArrow version: 7.0.0
  • Pandas version: 1.5.2
@severo
Copy link
Contributor Author

severo commented Dec 23, 2022

Hmmm, git bisect tells me that the behavior is the same since 67e65c9 (3 Oct), i.e. #4926

@severo
Copy link
Contributor Author

severo commented Dec 23, 2022

Maybe related to the discussions in #5196

@severo severo changed the title Error when pushing to hub (CI) Error when pushing to the CI hub Dec 23, 2022
@severo
Copy link
Contributor Author

severo commented Dec 23, 2022

Maybe the current version of moonlanding in Hub CI is the issue.

I relaunched tests that were working two days ago: now they are failing. huggingface/dataset-viewer@7464144 for example

cc @huggingface/moon-landing

@mariosasko
Copy link
Collaborator

Hi! I don't think this has anything to do with datasets. Hub CI seems to be the culprit - the identical failure can be found in this PR (with unrelated changes) opened today.

@severo
Copy link
Contributor Author

severo commented Dec 23, 2022

OK! Thanks for looking at it. Closing then.

@severo severo closed this as completed Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants