Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CSV infer_field_schema to be more restrictive #802

Open
domoritz opened this issue Sep 24, 2021 · 1 comment · May be fixed by #5406
Open

Improve CSV infer_field_schema to be more restrictive #802

domoritz opened this issue Sep 24, 2021 · 1 comment · May be fixed by #5406
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@domoritz
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

infer_field_schema infers any integer to be an int64:

DataType::Int64
. This means that data parsed as CSV with the automatic schema is going to be much larger than necessary.

Describe the solution you'd like

Integers should only be as long as necessary to represent the numbers. Similarly, floats should only be as large as needed to accurately represent the numbers in the data.

Describe alternatives you've considered

The current behavior works but doesn't use the format efficiently.

@domoritz domoritz added the enhancement Any new improvement worthy of a entry in the changelog label Sep 24, 2021
@jondo2010
Copy link
Contributor

I just ran into this from the other side: my UInt64 data doesn't fit in an Int64, so this method actually hard-fails to parse my CSV. I'll take a look at a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants