Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate maximum lengths of integer types (BigInteger) and floating-point (BigDecimal) in `StreamReadConstraints #924

Closed
cowtowncoder opened this issue Feb 21, 2023 · 6 comments

Comments

@cowtowncoder
Copy link
Member

I know we have gone back and forth with this question, but I think that it would make sense to allow specifying different maximum lengths for integer numbers and floating-point numbers. This is mostly because:

  1. Cost of processing floating-point numbers can be significantly higher (and go slow really fast when length grows beyond certain point) than that of integers (at least wrt BigInteger handling) AND
  2. There are conceivable cases where lengths for BigInteger may need to exceed anything that makes sense for BigDecimal -- f.ex for cryptographic cases.

Put another way: someone might want to cap FPs to, say, 100 digits, but allow BigIntegers with, say, over 1000 digits.

Now: I think that we might want to leave a convenience method that sets both limits -- Builder.maxNumberLength() -- while introducing new ones. We already have separate validation methods, but would need to separate accessors.

And finally, we could consider changing defaults to be different; although I think default of 1000 is not a bad choice.

@cowtowncoder
Copy link
Member Author

/cc @pjfanning -- Just hoping to nail this one before we do our 2.15 release candidates, sometime in early March (I hope).

@pjfanning
Copy link
Member

I am not convinced. BigInteger parsing is just about as slow as BigDecimal parsing.

@cowtowncoder
Copy link
Member Author

I am not convinced. BigInteger parsing is just about as slow as BigDecimal parsing.

Really? Is the parsing of Double then something that is significantly slower (wrt 2- vs 10-base)?

@pjfanning
Copy link
Member

BigInteger and BigDecimal are by the far the most dangerous regarding parse times. FasterXML/jackson-module-scala#609 was caused by BigInteger parsing.

@cowtowncoder
Copy link
Member Author

Ok. My understanding on floating-point parsing performance is incorrect then.

I'll close this issue.

@cowtowncoder
Copy link
Member Author

cowtowncoder commented Feb 22, 2023

Ok I decided to have a look and you are right in that performance difference b/w BigInteger and BigDecimal is negligible: I guess their internal representations are similar (latter just has the magnitude for "moving" decimal point etc).

About the only interesting finding is that: java.lang.Double decoding is slightly slower yet; in my test: for 1000 digit numbers 50% more time spent.
Test I'll add in https://github.com/cowtowncoder/misc-micro-benchmarks gives:

Benchmark                               Mode  Cnt    Score   Error  Units
LongNumberParsing.perfParseBigDecimal  thrpt    5  662.874 ± 5.964  ops/s
LongNumberParsing.perfParseBigInteger    thrpt   5  671.848 ± 4.698  ops/s
LongNumberParsing.perfParseDouble         thrpt    5  434.849 ± 1.793  ops/s

(but with 200 digits, numbers are same... odd)

So your point stands: performance of plain decoding (never mind use) is very similar for these "big number" cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants