Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parsing JSON files in array form #4997

Merged
merged 6 commits into from Sep 20, 2022
Merged

Conversation

mariosasko
Copy link
Contributor

Support parsing JSON files in the array form (top-level object is an array). For simplicity, json.load is used for decoding. This means the entire file is loaded into memory. If requested, we can optimize this by introducing a param similar to lines in pandas.read_json, which, if set to True, would allow us to read in chunks.

Fixes #4963

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 20, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! Loading the file in memory is fine for now

@mariosasko mariosasko merged commit 1a9385d into main Sep 20, 2022
@mariosasko mariosasko deleted the parse-json-list branch September 20, 2022 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset without script does not support regular JSON data file
3 participants