New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Denial of service when parsing a JSON object with an unexpected field that has a big number #609
Comments
A similar issue can be reproduced using weePickle: rallyhealth/weePickle#118 |
@plokhotnyuk jackson-module-scala only really adds the ability to process Scala classes to jackson-databind/jackson-core. I think any changes needed to help here will end up going into jackson-databind or jackson-core. @cowtowncoder is there any existing support in jackson-databind to skip JSON subtrees that are not needed? The scenario is that a Java or Scala class can have certain fields but the JSON has extra fields that are not needed because they do not appear in the class. |
I get these results on my laptop and the results don't seem sub-quadratic (default branch as opposed to git checkout jackson-DoS-by-a-big-number branch). They are not great, given the amount of processing time spent on parsing the unexpected field. Zulu 11.0.14 JRE
|
The default branch tests for vulnerabilities of hash code collision handling only. Need to uncomment some subsequent line for setup of input data to test other types of vulnerabilities: |
Thanks @plokhotnyuk - I see the sub-quadratic performance in the jackson-DoS-by-a-big-number branch. From your flame graph, it looks like the |
I think that an acceptable way to fix is replacing all this kind of calls by some more efficient algorithm of parsing Also, I'm thinking about detection of number types without parsing of big numbers. The best option would be skipping of number values without parsing even small values like here. |
@plokhotnyuk jackson 2.14.0 full release is about to get released and I don't want to delay it. jackson-core already has a BigDecimalParser that has some handling for very large numbers. This code is not nearly as good as yours but a small change to jackson-core to make it use BigDecimalParser to parse BigIntegers when the string is big does help quite a lot. My initial testing shows that So a quick and dirty solution for v2.14 could be to use I will look to add the FastBigDecimalParser (that is a Java port of your code) in Jackson 2.15. There is currently a bug in my FastBigDecimalParser that stops it parsing very large integers. |
One minor comment for one of questions ("does jackson-databind have a way to skip sub-trees"): no and yes -- nothing explicit, but any content that is not bound will be skipped. So fully typed POJOs/Scala objects define values that are extracted; anything else will be skipped (... or report exception depending). This would not apply to use cases where the whole content is bound to |
@cowtowncoder in the benchmarks highlighted by @plokhotnyuk, a Scala class is used that does not have the field that contains the big number. In theory, that field can be skipped and not parsed. The evidence from the benchmark is that the field is not skipped. If the data in the ignorable field is made larger, the benchmark slows down. It appears that JsonParser code in jackson-core is parsing the number using |
@pjfanning Would it be possible to have a snippet of class definition here? There are possibilities of things that could explain it; for example scala module or some specific deserializer reading content into interemediate representation -- or, maybe, requiring buffering due to use of Constructor/Static factory method. Buffering is probably the likeliest case; currently that would force reading of content into value, here, |
The class (basically Scala equivalent of a JavaBean with
The JSON is generated using this (where size is a large number):
generates something like:
|
Ok I am guessing that case classes must be using constructor so that would often require buffering, depending on order of properties being found from input. It could of course also be that case classes use custom deserializer which could do whatever it wants (and not be using |
I'll try to produce a Java benchmark. |
Ok. In Java equivalent test, I get an exception:
I'm not sure why this does not kick in when Scala is involved. |
Could it be due to ignoring of unknown fields? |
@cowtowncoder I set pjfanning/jackson-number-parse-bench#1 The slowdown is subquadratic. I haven't debugged this to see where the time is spent.
|
Yes @plokhotnyuk, when accepting json from untrusted sources, you should not use |
@cowtowncoder since this only happens if |
That is great that default settings are safe, but it should be stated somewhere in docs and tutorials that disabling of Also, in Scala almost all JSON parsers allow unknown fields by default so users tend to skip unknown field with jackson-module-scala too and in Java some frameworks override defaults like Spring. |
@pjfanning Right, I am mostly interested in figuring out why skipping won't work, as a potential optimization. But agreed not a real blocker especially if and when it's not something new. |
Is the idea that this might allow easier DoS somehow since -- in some cases.... --- skipping may not work as efficiently as we'd like? Otherwise I am not sure calling out disabling But if anyone has an idea of good wording I would of course be open to addition to Javadocs for annotation ( |
@cowtowncoder I added benchmarks to https://github.com/pjfanning/jackson-number-parse-bench and I was found that |
@pjfanning Thank you for digging into this. Could you file a separate issue in One thing to note on skipping unknown properties: this setting just prevents throwing of exception. |
@pjfanning Ok looking at the test, I wonder if this is only due to required buffering of the number value itself -- integer with 1M digits is 1000x length of one with 1k digits, and performance difference seems to be factory of 2000x. So longer value is 50% slower than might be expected, given the document content in both cases is 99.9%+ just one biggish integer. Now the overhead for 1M digit number is due to Of course it'd be possible to implement fully lazy decoding (Woodstox does this, I think)... but that adds complexity. EDIT: actually lazy decoding is bit tricky since we can have both rather long |
The lazy parsing of BigInt/BigDec doesn't appear to help this jackson-module-scala use case but should help in the weePickle use case. I had expected that the Investigating the performance is not my priority because supporting unexpected fields is not the default config. The next item I would like to get done is to limit the size of numbers. The bad perf only happens for giant numbers (10000+ chars) and most users would be better off banning JSON inputs like that. |
Good investigation @pjfanning. Some notes: First: buffering in this case is done if "property-based" constructor is used. This is the case for (no pun intended) case classes. For "dumb" value classes with just fields and/or setters/getters no buffering would be needed. So anyone wanting maximum performance may consider use of such dumb value classes (and just for those). Second, as you mentioned, there's earlier check:
which should skip handling for some cases and it sounds like annotation does cause this to be true. Features that would necessitate buffering (aside from matching one of Constructor parameters) include:
|
@pjfanning Ah ha! So that flag is ONLY set if
returns setting to indicate it. It uses:
but as far as I can see, logic does NOT (unfortunately!) consider global Now... I think I'll file an issue for this, but I am not sure it can be resolved for 2.14.0: it seems slightly risky to add short-cut. There also needs to be a way to test this, somehow, to verify logic of detection. Still, it is kind of odd that even when seemingly skipping content there is no real difference in performance. |
@plokhotnyuk @cowtowncoder if you add this annotation to ExtractFields class, then the BigInteger is ignored and not parsed.
jackson-databind skips the unexpected fields in vanilla mode but the Scala classes are not treated as vanilla because there is not a no param constructor to use. In non-vanilla mode, jackson-databind insists on parsing the unexpected fields unless you set the annotation above. |
Unfortunately, it doesn't compile in the jsoniter-scala benchmark codebase: Scala 3 compiler fails to generate byte code with the following error (and, definitely, should be reported here https://github.com/lampepfl/dotty/issues):
Scala 2 compiler starts to entangle magnolia macros of zio-json that leads to generation of invalid codecs that fail tests (that is why I don't use Java annotations for benchmarked data types at all, probably, worth to be reported here: https://github.com/scala/bug/issues). |
@plokhotnyuk I have https://github.com/pjfanning/jackson-scala-bench and this seems to run with Scala 2.13 and Scala 3.2.1. |
@cowtowncoder is there another way to ignore unknown properties for cases where users disable failure on unknown properties? |
closing in favour of FasterXML/jackson-databind#3721 |
Sub-quadratic decreasing of throughput when length of the JSON object is increasing
On contemporary CPUs parsing of such JSON object with an additional field that has of 1000000 decimal digits (~1Mb) can took ~13 seconds:
Probably the root cause is in the jackson core library, but reporting hear hoping that a hot fix to just skip unwanted values can be applied on the jackson-module-scala level.
Flame graph for CPU cycles at size=1000000
jackson-module-scala version
2.14.0-rc2
Scala version
2.13.10
JDK version
17
Steps to reproduce
To run that benchmarks on your JDK:
sbt
and/or ensure that it already installed properly:jsoniter-scala
repo:The text was updated successfully, but these errors were encountered: