Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round-tripping and operations on decimals #2580

Merged
merged 1 commit into from Jun 20, 2023
Merged

Conversation

FiV0
Copy link
Contributor

@FiV0 FiV0 commented Jun 20, 2023

Short circuiting #2044. Handling of decimals generally in a dynamic engine seems to be quite a rabbit hole.
This PR essentially allows operation on decimals for precision 38 and scale 19. Anything outside of this range will currently throw an exception. Anything coming out of the engine will also have that precision and scale set.

Some useful information:

  • Decimal Precision Validation apache/arrow-rs#2387 - only useful link I could find of why arrow actually stores precision
  • IEEE-754 2008 - expression evaluation between different types (doesn't actually say much)
  • SQL standard 6.27 - if both operands are exact numeric (decimal, ints) -> (implementation dependent) exact numeric, if one type is approximate numeric (float, double) -> (implementation dependent) approximate numeric
  • Postgres (http://sqlfiddle.com/#!17/8756f8/2) and Clojure ((type (+' (double 1.0) (bigdec 1.0)))) seem to favor Double over Decimal

Stuff still todo/decide:

  • correctly support operations between other types and decimal this currently follows the same behavior as Clojure (fewest changes). One could think about promoting an operation on :f32 and :decimal to :f64, but the SQL spec does not require it.
  • How should we deal with decimals that still fit into an Arrow Decimal (128bit) but fall outside the above range (1E+35M , 1E-35M)
  • How to deal with things that don't fit into an Arrow Decimal ? Over- and Underflow?

Copy link
Member

@jarohen jarohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @FiV0 👏

One question - when you (e.g.) (.setScale 1M 10) you get back 1.0000000000M, so are you in fact getting back bigdecs with extra zeroes? (would be hidden by the equality check) - and, if so, does it matter?

I guess the alternative would be to parameterise the scale on our decimal col-type (like we do for the timestamps), but I guess this has other consequences, like needing separate vectors/casts for bigdecs that happen to differ in scale...

Happy to go with your decision on this one - but regardless of what you choose, mind writing up your decision and rationale on the PR so that we remember it when we come back to future work on bigdecs?

Cheers! 🙏

core/src/main/clojure/xtdb/vector/writer.clj Outdated Show resolved Hide resolved
core/src/main/clojure/xtdb/types.clj Show resolved Hide resolved
@FiV0 FiV0 added the 2.x label Jun 20, 2023
@FiV0 FiV0 force-pushed the decimal-vectors branch 3 times, most recently from f8b4091 to 944d9ad Compare June 20, 2023 12:07
@FiV0
Copy link
Contributor Author

FiV0 commented Jun 20, 2023

The liberal widening of int -> decimal and decimal -> float in the widening-hierarchy is only possible because we are mainly using polymorphic clojure functions in the expression engine. The cases where Math/* is used it just works out of the box because Math/pow and Math/log correctly coerce the arguments to double. The cases of greatest/least (Math/min / Math/max under the hood) only work because 2-arugments invocations get replaced by calls to < and >. So this part definitely needs to change when expanding these cases.

@FiV0 FiV0 merged commit b132997 into xtdb:2.x Jun 20, 2023
5 checks passed
@jarohen jarohen linked an issue Jun 20, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Handling BigDecimal-like vectors in the expression engine
2 participants