Add sql-compliant feature for enabling sql-compliant kernel behavior #2568

viirya · 2022-08-24T05:39:21Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Some kernels behaves different with SQL semantics. For example, by definition, NaN is not equal to itself but NaN is equal to NaN with SQL semantics. Using current comparison kernels in SQL system leads to different behavior and generates incorrect results.

Describe the solution you'd like

We should provide SQL-compliant kernels which can be enabled by feature flag.

Describe alternatives you've considered

Additional context

tustvold · 2022-08-29T11:38:17Z

There is an IEEE standard for total ordering of floats, I wonder if we could use that? Aside from being a standard, where I've not managed to find an authoritative SQL standard for floats, it can be implemented with relatively cheap bit manipulation, instead of more expensive branching.

There is a built-in implementation here, which also does a good job explaining its behaviour. Theoretically the performance may be good enough that we can just always have this behaviour without needing a feature flag?

What do you think?

Also tagging @alamb

viirya · 2022-08-29T16:59:11Z

Hmm, the NaN ordering of the IEEE standard treats NaN value larger than any other numbers. Looks it is consistent with the SQL-compliant ordering we need. I can rewrite these kernels with total_cmp. As without a flag, it is somehow breaking existing behavior but I think current behavior can be thought as wrong or not useful.

viirya · 2022-08-29T17:12:28Z

I think total_cmp looks promising as it treats NaN ordering the same way as Spark/Postgresql. cc @sunchao to take a look too in case if I miss any point here. 😄

NGA-TRAN · 2022-08-29T17:26:05Z

@viirya

For example, by definition, NaN is not equal to itself but NaN is equal to NaN with SQL semantics

Can you provide a more specific example for NaN is equal to NaN with SQL semantics? This document says NaN is not equal to itself in SQL https://www.vertica.com/blog/vertica-quick-tip-query-nan-values/

viirya · 2022-08-29T17:54:39Z

We've discussed this in PRs. It depends on SQL implementations. As far as I know, Spark and postgresql treats NaNs equal.

sunchao · 2022-08-29T17:57:15Z

I think total_cmp looks promising as it treats NaN ordering the same way as Spark/Postgresql. cc @sunchao to take a look too in case if I miss any point here. 😄

I took a look too, and yes it looks like total_cmp exhibits the same behavior as Spark/Postgres/Snowflake etc, so I think it could be good idea to replace the existing code with it.

viirya added the enhancement Any new improvement worthy of a entry in the changelog label Aug 24, 2022

This was referenced Aug 24, 2022

Support SQL-compliant behavior on eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #2569

Closed

Support SQL-compliant NaN ordering between for DictionaryArray and non-DictionaryArray #2599

Closed

viirya mentioned this issue Aug 30, 2022

Use total_cmp for floating value ordering and remove nan_ordering feature flag #2613

Closed

viirya closed this as completed Aug 30, 2022

tustvold added the arrow Changes to the arrow crate label Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sql-compliant feature for enabling sql-compliant kernel behavior #2568

Add sql-compliant feature for enabling sql-compliant kernel behavior #2568

viirya commented Aug 24, 2022 •

edited

tustvold commented Aug 29, 2022 •

edited

viirya commented Aug 29, 2022

viirya commented Aug 29, 2022

NGA-TRAN commented Aug 29, 2022 •

edited

viirya commented Aug 29, 2022

sunchao commented Aug 29, 2022

Add sql-compliant feature for enabling sql-compliant kernel behavior #2568

Add sql-compliant feature for enabling sql-compliant kernel behavior #2568

Comments

viirya commented Aug 24, 2022 • edited

tustvold commented Aug 29, 2022 • edited

viirya commented Aug 29, 2022

viirya commented Aug 29, 2022

NGA-TRAN commented Aug 29, 2022 • edited

viirya commented Aug 29, 2022

sunchao commented Aug 29, 2022

viirya commented Aug 24, 2022 •

edited

tustvold commented Aug 29, 2022 •

edited

NGA-TRAN commented Aug 29, 2022 •

edited