Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coerce_types flag to parquet ArrowWriter #1938

Open
tustvold opened this issue Jun 24, 2022 · 0 comments · May be fixed by #5640
Open

Add coerce_types flag to parquet ArrowWriter #1938

tustvold opened this issue Jun 24, 2022 · 0 comments · May be fixed by #5640
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted

Comments

@tustvold
Copy link
Contributor

tustvold commented Jun 24, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As discussed in #1666 not all types can be represented within a parquet schema.

Describe the solution you'd like

The consensus appears to be to:

  • By default faithfully round-trip the source data, performing no potentially lossy type conversion
  • Add a coerce_types flag that will use the arrow cast kernels to coerce incompatible types prior to writing them

In particular

Date64

If not coerce_types, write as Int64 and embed logical type in arrow schema only. Otherwise case to Date32

Timestamp

If not coerce_types, write as is, setting LogicalType / ConvertedType only where appropriate.

If coerce_types, cast to a UTC timestamp with the closest supported time unit, likely needing #1936.

Interval

If not coerce_types, write as FixedSizeBinaryArray matching the arrow representation and store logical type in arrow schema.

If coerce_types, convert to the relevant parquet representation.

Describe alternatives you've considered

See #1666

@tustvold tustvold added good first issue Good for newcomers enhancement Any new improvement worthy of a entry in the changelog help wanted labels Jun 24, 2022
getChan added a commit to getChan/arrow-rs that referenced this issue Apr 10, 2024
getChan added a commit to getChan/arrow-rs that referenced this issue Apr 13, 2024
getChan added a commit to getChan/arrow-rs that referenced this issue Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant