Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynComparator/LexicographicalComparator Incorrectly Handles Dictionary Nulls #2687

Closed
tustvold opened this issue Sep 8, 2022 · 2 comments · Fixed by #4691
Closed

DynComparator/LexicographicalComparator Incorrectly Handles Dictionary Nulls #2687

tustvold opened this issue Sep 8, 2022 · 2 comments · Fixed by #4691
Labels

Comments

@tustvold
Copy link
Contributor

tustvold commented Sep 8, 2022

Describe the bug

DynComparator does not inspect null masks. This is by design as it isn't provided a SortOptions specifying how to order them.

Unfortunately the values array of a dictionary is permitted to contain nulls, which causes the comparator to break.

To Reproduce

The builders will never put a null in a dictionary, but if you manually construct the DictionaryArray from ArrayData it is possible.

let values = Int32Array::from_iter([None, Some(1)]).into_data();
let keys =
    Int32Array::from_iter([Some(0), None])
        .into_data();

let data = keys
    .into_builder()
    .data_type(DataType::Dictionary(
        Box::new(DataType::Int32),
        Box::new(DataType::Int32),
    ))
    .child_data(vec![values])
    .build()
    .unwrap();
let dictionary = DictionaryArray::<Int32Type>::from(data);

If you then construct a comparator on this dictionary it will say that the two rows are not equal, when they are technically both null. If you remove the Some(1) it will panic.

Expected behavior

This should be handled correctly

Additional context

Longer term I intend for these to be deprecated and replaced with the row format #2677, and so perhaps we just defer on fixing this.

@alamb
Copy link
Contributor

alamb commented Oct 27, 2022

Maybe we could simply start by generating an error

@tustvold
Copy link
Contributor Author

This same issue also impacts the ability to add support for nested types, such as structs #3187

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants