Fix ignored limit on `lexsort_to_indices` #2991

alamb · 2022-10-31T11:21:33Z

Which issue does this PR close?

Rationale for this change

Regresssion was introduced in #2929 by https://github.com/apache/arrow-rs/pull/2929/files#r1005140128 and there was no test coverage 😭

What changes are included in this PR?

Fix bug
Add test coverage

Are there any user-facing changes?

Not really as we haven't released this code yet

cc @isidentical

alamb · 2022-10-31T11:21:44Z

arrow/src/compute/kernels/sort.rs

@@ -950,7 +950,7 @@ pub fn lexsort_to_indices(
    });

    Ok(UInt32Array::from_iter_values(
-        value_indices.iter().map(|i| *i as u32),
+        value_indices.iter().take(len).map(|i| *i as u32),


this is the bugfix

alamb · 2022-10-31T11:23:18Z

arrow/src/compute/kernels/sort.rs

@@ -3439,7 +3451,8 @@ mod tests {
            Some(2),
            Some(17),
        ])) as ArrayRef];
-        test_lex_sort_arrays(input.clone(), expected, None);
+        test_lex_sort_arrays(input.clone(), expected.clone(), None);
+        test_lex_sort_arrays(input.clone(), slice_arrays(expected, 0, 2), Some(2));


The only place on master that a limit is passed to lexsort_to_indices in the tests is immediately below here. However, very sadly, there is a special case code path for single arrays that doesn't hit the bug path

let expected = vec![Arc::new(PrimitiveArray::<Int64Type>::from(vec![ Some(-1), Some(0), Some(2), ])) as ArrayRef]; test_lex_sort_arrays(input, expected, Some(3));

This addition is strictly unnecessary from a coverage perspective (it was already covered), but I wanted to make the test_lex_sort_arrays based tests all consistently patterned so it was easier to reason about coverage

alamb · 2022-10-31T11:24:50Z

arrow/src/compute/kernels/sort.rs

@@ -3519,7 +3532,8 @@ mod tests {
                Some(-2),
            ])) as ArrayRef,
        ];
-        test_lex_sort_arrays(input, expected, None);
+        test_lex_sort_arrays(input.clone(), expected.clone(), None);
+        test_lex_sort_arrays(input, slice_arrays(expected, 0, 2), Some(2));


This test fails immediately without the fix -- the output is too big!)

arrow/src/compute/kernels/sort.rs

isidentical

What a fast fix ❤️ Looks great to me!

Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>

alamb · 2022-10-31T17:26:30Z

I plan to create 26.0.0 RC2 with this fix

* Fix ignored limit on lexsort_to_indices * Update comments * Update arrow/src/compute/kernels/sort.rs Co-authored-by: Batuhan Taskaya <isidentical@gmail.com> Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>

ursabot · 2022-10-31T17:32:19Z

Benchmark runs are scheduled for baseline = 40d61ec and contender = 66c9636. 66c9636 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

alamb added 2 commits October 31, 2022 07:18

Fix ignored limit on lexsort_to_indices

a2205af

Update comments

8afe667

alamb commented Oct 31, 2022

View reviewed changes

alamb requested review from tustvold and Dandandan October 31, 2022 11:26

alamb mentioned this pull request Oct 31, 2022

Update to arrow 26, change timezones apache/datafusion#4039

Merged

isidentical reviewed Oct 31, 2022

View reviewed changes

arrow/src/compute/kernels/sort.rs Outdated Show resolved Hide resolved

isidentical approved these changes Oct 31, 2022

View reviewed changes

Dandandan approved these changes Oct 31, 2022

View reviewed changes

github-actions bot added the arrow label Oct 31, 2022

viirya approved these changes Oct 31, 2022

View reviewed changes

alamb merged commit 66c9636 into apache:master Oct 31, 2022

alamb deleted the alamb/lexsort_to_indices_limit branch October 31, 2022 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ignored limit on `lexsort_to_indices` #2991

Fix ignored limit on `lexsort_to_indices` #2991

alamb commented Oct 31, 2022 •

edited

Loading

alamb Oct 31, 2022

alamb Oct 31, 2022

alamb Oct 31, 2022

alamb Oct 31, 2022

isidentical left a comment

alamb commented Oct 31, 2022

ursabot commented Oct 31, 2022

Fix ignored limit on lexsort_to_indices #2991

Fix ignored limit on lexsort_to_indices #2991

Conversation

alamb commented Oct 31, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb Oct 31, 2022

Choose a reason for hiding this comment

alamb Oct 31, 2022

Choose a reason for hiding this comment

alamb Oct 31, 2022

Choose a reason for hiding this comment

alamb Oct 31, 2022

Choose a reason for hiding this comment

isidentical left a comment

Choose a reason for hiding this comment

alamb commented Oct 31, 2022

ursabot commented Oct 31, 2022

Fix ignored limit on `lexsort_to_indices` #2991

Fix ignored limit on `lexsort_to_indices` #2991

alamb commented Oct 31, 2022 •

edited

Loading