Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flip interval field ordering (#5654) #5656

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Apr 16, 2024

Which issue does this PR close?

Closes #5654

Rationale for this change

There have been reports that the definition we use is backwards, and whilst the specification is kind of ambiguous, we should probably follow what arrow-cpp does.

This is a draft as I think we should look to get integration test coverage in first to make sure that this is actually correct.

What changes are included in this PR?

Are there any user-facing changes?

Aside from the obvious there are a couple of potentially problematic changes that result from this:

  • The ordering of intervals changes, as this is an arbitrary order driven by the bit representation
  • Data in IPC files will be interpreted differently

The order was already arbitrary so the first may not be too problematic, but the latter is potentially. I do, however, suspect that few people are encoding intervals in IPC files else we would have had reports of this before

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 16, 2024
@@ -1714,18 +1714,17 @@ mod tests {
IntervalDayTimeType::make_value(1, 3000),
// 90M milliseconds
IntervalDayTimeType::make_value(0, 90_000_000),
IntervalDayTimeType::make_value(4, 10),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interval ordering is now even more arbitrary 😅

I suspect this is the most user-visible change, as I suspect not many people are manually bit-twiddling

@tustvold tustvold added the api-change Changes to the arrow API label Apr 16, 2024
@tustvold
Copy link
Contributor Author

Looks like the cast kernels will also need updating, but I'd like to get feedback on if this is even the right approach to take first.

@avantgardnerio
Copy link
Contributor

I think we should look to get integration test coverage in first to make sure that this is actually correct

💯

@pitrou
Copy link
Member

pitrou commented Apr 17, 2024

You can probably add some PyArrow integration test(s) in https://github.com/apache/arrow-rs/blob/master/arrow-pyarrow-integration-testing/tests/test_sql.py ? Both to _supported_pyarrow_types and perhaps also a type-specific test to check that the values are interpreted correctly.

@david542542
Copy link

Thanks for opening this PR. Any idea on ETA for when this would be merged in?

@tustvold
Copy link
Contributor Author

tustvold commented Apr 20, 2024

As this is a breaking change with potentially major downstream implications, I suspect it will likely be a while before we incorporate this PR, if even at all. The major potential pain point concerns IPC data files that have been written or are being sent over the network. We need to be very careful here.

I suspect this will be a timeline in months I am afraid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rust Interval definition incorrect
4 participants