Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore saved format state in load_from_disk #5073

Merged
merged 3 commits into from Oct 11, 2022

Conversation

asofiaoliveira
Copy link
Contributor

Hello! @mariosasko

This pull request relates to issue #5050 and intends to add the format to datasets loaded from disk.
All I did was add a set_format in the Dataset.load_from_disk, as DatasetDict.load_from_disk relies on the first.

I don't know if I should add a test and where, so let me know if I should and I can work on that as well!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 5, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@mariosasko mariosasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Yes, we need to test this. You can add a test as a method of BaseDatasetTest in tests/test_arrow_dataset.py. In that test, you can save a formatted dataset with save_to_disk, then reload it with load_from_disk and then check if the format attributes of the original and the loaded dataset are the same.

@mariosasko mariosasko linked an issue Oct 6, 2022 that may be closed by this pull request
Copy link
Contributor

@mariosasko mariosasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced set_format with with_format because I slightly prefer the latter. Looks all good now, thanks!

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and for the test ! :)

@lhoestq lhoestq merged commit 3d4a2c7 into huggingface:main Oct 11, 2022
@asofiaoliveira asofiaoliveira deleted the add-format-to-load_from_disk branch October 11, 2022 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restore saved format state in load_from_disk
4 participants