Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] A collection of Interval arithmetic (not Intervals of time) improvements #7882

Open
1 of 3 tasks
alamb opened this issue Oct 20, 2023 · 6 comments
Open
1 of 3 tasks
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Oct 20, 2023

Is your feature request related to a problem or challenge?

DataFusion contains a powerful general purpose interval arithmetic library, in intervals, first introduced in #5322

This library is currently used for statistic and selectivity estimation, and to provide additional guarantees during expression simplification.

This ticket tracks additional items needed to improve the sophistication of DataFusion in this regard

@berkaysynnada
Copy link
Contributor

I am working on some new supports and improvements in interval arithmetic. I also plan to move the code into a more accessible place (possibly datafusion_expr). I will complete my work next week and link them here.

@alamb
Copy link
Contributor Author

alamb commented Oct 20, 2023

Thanks @berkaysynnada -- note I just proposed three relatively small changes . Perhaps you have some time to look at them:

I also plan to move the code into a more accessible place (possibly datafusion_expr)

What would you think about making a datafusion-interval crate to hold the interval analysis library ? that would keep it nicely isolated and likely improve the development cycle time (I found it quite painful to work on in datafusion-physical-expr)

@alamb
Copy link
Contributor Author

alamb commented Oct 20, 2023

I also filed #7887

@berkaysynnada
Copy link
Contributor

berkaysynnada commented Oct 21, 2023

Thanks @berkaysynnada -- note I just proposed three relatively small changes . Perhaps you have some time to look at them:

I also plan to move the code into a more accessible place (possibly datafusion_expr)

What would you think about making a datafusion-interval crate to hold the interval analysis library ? that would keep it nicely isolated and likely improve the development cycle time (I found it quite painful to work on in datafusion-physical-expr)

Thank you for the care and effort you have put into this issue 😌. I tried to answer the current behavior and how it should actually be in the reviews. However, as I mentioned in them, I am currently working on a comprehensive interval library update. It will eliminate many of the bugs you noticed and provide a more open, error-free API for users. I don't want to keep you waiting, but I think it will be easier to solve these issues after I quickly finalize my PR and present it to you.

As for making the new interval crate, it would be better to answer after discussing it with my teammates. Thanks again.

@alamb
Copy link
Contributor Author

alamb commented Oct 21, 2023

. I don't want to keep you waiting, but I think it will be easier to solve these issues after I quickly finalize my PR and present it to you.

There is no particular time pressure from my end. If you are working to revamp the code, I'll wait until that lands (and try to help review it promptly) and then we can pick up the PRs.

BTW is there a ticket somewhere that tracks what you plan to do (mostly I am thinking of any 'end user visible bugs/ behaviors' that might result)?

As for making the new interval crate, it would be better to answer after discussing it with my teammates. Thanks again.

Sounds good -- we can break it into a new crate at some later time too, if that is appropriate. There is no reason to do it as part of your in-filght refactoring

@berkaysynnada
Copy link
Contributor

berkaysynnada commented Oct 23, 2023

BTW is there a ticket somewhere that tracks what you plan to do (mostly I am thinking of any 'end user visible bugs/ behaviors' that might result)?

There is no ticket here but I can summarize them as follows:

  • Relocating the code.
  • Propagate arithmetic API change (the a-b<0 assumption will no longer be used for a<b propagation, which will also be useful for easy integration of supports such as OR).
  • Mul and Div support
  • Interval construct API (including checks)
  • The relationship of the next_value() function with infinite bounds.

Sounds good -- we can break it into a new crate at some later time too, if that is appropriate. There is no reason to do it as part of your in-filght refactoring

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants