New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to arrow and parquet 27.0.0 #4199
Conversation
db063d2
to
462a2e2
Compare
462a2e2
to
97983f4
Compare
97983f4
to
a5a61b4
Compare
a5a61b4
to
f572159
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- @andygrove or @Dandandan if you have a moment to review the TPCH q6 plan changes I think they are better but would appreciate a second opinion.
Thank you @tustvold
@@ -1,742 +1,755 @@ | |||
#[derive(Clone, PartialEq, ::prost::Message)] | |||
pub struct ColumnRelation { | |||
#[prost(string, tag="1")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these changes due to some updated version of prost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we updated prost-build as part of this
Filter: lineitem.l_shipdate >= Date32("8766") AND lineitem.l_shipdate < Date32("9131") AND lineitem.l_discount >= Decimal128(Some(5),15,2) AND lineitem.l_discount <= Decimal128(Some(7),15,2) AND lineitem.l_quantity < Decimal128(Some(2400),15,2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plan has improved -- the projections are gone and the exprs are simpler
I believe the improvement stem from better casting support in Arrow, but I can't honestly find the specific improvement in https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md#2700-2022-11-11
As a reminder the query is
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < date '1995-01-01'
and l_discount between 0.06 - 0.01 and 0.06 + 0.01
and l_quantity < 24;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better to me
@@ -572,7 +572,7 @@ mod tests { | |||
Some(15000), | |||
Some(25000), | |||
Some(30000), | |||
Some(11234), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is caused by apache/arrow-rs#3021 (cc @liukun4515 ) where now arrow will round 1.123_456_8 to the nearest integer rather than simply truncating it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was actually apache/arrow-rs#3000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is caused by apache/arrow-rs#3021 (cc @liukun4515 ) where now arrow will round 1.123_456_8 to the nearest integer rather than simply truncating it
this changes is reasonable for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have filled a new issue to talk about the rounding
when cast decimal to a decimal with smaller scale in
apache/arrow-rs#3139 cc @tustvold @alamb
I plan to merge this once I get it green -- there is a huge PR backlog that I would like to clear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Benchmark runs are scheduled for baseline = 4dcf985 and contender = ebb24c5. ebb24c5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
🎉 |
Which issue does this PR close?
re apache/arrow-rs#3045
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?