Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link to datasets YAML configuration page #1222

Open
severo opened this issue Feb 21, 2024 · 11 comments
Open

Link to datasets YAML configuration page #1222

severo opened this issue Feb 21, 2024 · 11 comments
Labels
documentation Improvements or additions to documentation

Comments

@severo
Copy link
Contributor

severo commented Feb 21, 2024

Link to https://huggingface.co/docs/datasets/v2.7.1/en/dataset_card#more-yaml-tags from https://huggingface.co/docs/hub/datasets-manual-configuration, to complement with all the possible values in README's YAML

@severo severo added documentation Improvements or additions to documentation good first issue Good for newcomers labels Feb 21, 2024
@severo
Copy link
Contributor Author

severo commented Feb 23, 2024

And give an example of each supported feature type in the YAML config. See https://discuss.huggingface.co/t/appropriate-yaml-for-dataset-info-list-float/74418 for example: I think we currently have no reference to share to the user.

@lappemic
Copy link
Contributor

Hey @severo, i just had a look into this. As far as i can see, there is no section about "More YAML tags" anymore in the Dataset docs. Is this correct? If yes, is this issue outdated or do i miss something?

@severo
Copy link
Contributor Author

severo commented May 14, 2024

@severo
Copy link
Contributor Author

severo commented May 14, 2024

Somewhat related: discussion about the spec: huggingface/dataset-viewer#2639

Also: should we just redirect to the spec (https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1), or should we create a dedicated doc page for this? Adding the link would already by a good step forward.

@severo
Copy link
Contributor Author

severo commented May 14, 2024

Also: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1 is outdated:

configs:  # Optional for datasets with multiple configurations like glue.
- {config_0}  # Example for glue: sst2
- {config_1}  # Example for glue: cola

It does not respect the current format: https://huggingface.co/docs/hub/datasets-manual-configuration.

Ideally, it should be the reference, with more details than https://huggingface.co/docs/hub/datasets-manual-configuration, not the other way.

cc @polinaeterna for example if you want to look at it

@severo severo removed the good first issue Good for newcomers label May 14, 2024
@lappemic
Copy link
Contributor

Adding the link would already by a good step forward.

Shall i start out with this and have a look where it leads us @severo? Or would you suggest a different approachch?

@severo
Copy link
Contributor Author

severo commented May 16, 2024

Hmmm, I think we have to improve the spec first. Then, link to it from the docs page, otherwise the link would not bring much value.

@lappemic
Copy link
Contributor

Let me know if i can help out somehow! Would be down for it. 😄

@severo
Copy link
Contributor Author

severo commented May 16, 2024

Do you want to work on a PR to improve the spec https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1 ?
The idea is to add the structure of the configs: field, to match https://huggingface.co/docs/hub/datasets-manual-configuration at least (config_name, data_files, etc). Some more fields can be passed, if I'm not wrong (it's defined in https://github.com/huggingface/datasets, but @polinaeterna knows these details better than I)

@lappemic
Copy link
Contributor

I would love to! Will open a PR for discussion.

@lappemic
Copy link
Contributor

Since the spec is improved, shall i open a PR to link the YAML configuratino page?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants