Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

READY - Stop DOS attacks by making the lexer stop early on evil input. #2892

Merged
merged 11 commits into from Jul 26, 2022

Conversation

bbakerman
Copy link
Member

@bbakerman bbakerman commented Jul 21, 2022

This is related to #2888

The bug was that we do indeed have a max token counting mechanism in graphql-java BUt it was being enacted too late.

Testing showed that the max token code was indeed being hit BUT the ANTLR lexing and parsing code was taking proportionally longer to get to the max token state as the input size increased

This is cause bv the greedy nature of the ANTLR Lexer - it will look ahead and store tokens in memory under certain grammar conditions and butted directives like @lol@lol are one of them. This meant that the lexer was the code contributing to CPU time and not the parser - BUT the max token code check was in the parser.

This PR puts the same max token checks on the lexer as it does in the parser. In fact it debatable if the parser checks should still be retained (since the lexer will be the main way this will be hit) but the logic is common so I left it in place.

The current existing billion @lols test still parse with this change - The difference is where cancel parse exception is being thrown.

The use of the lexer as the place to count means the counting checks are done in constant time. As soon as the max tokens is encountered, the parse is stopped.

graphql-java also uses 3 channels for tokens (0 for the grammar and 2 and 3 for comments and whitespace). The lexing of whitespace and comments also take up CPU time so they counters have been put in place to watch tokens on those channels as well.

This will stop a "mostly whitespace" attack, even though whitespace costs less to aggregate in practice

@bbakerman bbakerman changed the title Stop DOS attacks by making the lexer stop early on evil input. WIP - Stop DOS attacks by making the lexer stop early on evil input. Jul 21, 2022
@bbakerman
Copy link
Member Author

bbakerman commented Jul 21, 2022

The questions to be answered here is whether to track the whitespace and line comment channels or not. They are fast BUT they still take some time BUT not like the grammar channel does.

when we run the ParserBadSituations program with unlimited tokens the numbers for whitespace and comments are something like


Whitespace Bad Payloads(run #2)(1 of 15) - | query length 50020 | bad payloads 5000 | duration 4ms 
Whitespace Bad Payloads(run #2)(2 of 15) - | query length 100020 | bad payloads 10000 | duration 9ms 
Whitespace Bad Payloads(run #2)(3 of 15) - | query length 150020 | bad payloads 15000 | duration 16ms 
Whitespace Bad Payloads(run #2)(4 of 15) - | query length 200020 | bad payloads 20000 | duration 19ms 
Whitespace Bad Payloads(run #2)(5 of 15) - | query length 250020 | bad payloads 25000 | duration 28ms 
Whitespace Bad Payloads(run #2)(6 of 15) - | query length 300020 | bad payloads 30000 | duration 28ms 
Whitespace Bad Payloads(run #2)(7 of 15) - | query length 350020 | bad payloads 35000 | duration 50ms 
Whitespace Bad Payloads(run #2)(8 of 15) - | query length 400020 | bad payloads 40000 | duration 41ms 
Whitespace Bad Payloads(run #2)(9 of 15) - | query length 450020 | bad payloads 45000 | duration 46ms 
Whitespace Bad Payloads(run #2)(10 of 15) - | query length 500020 | bad payloads 50000 | duration 48ms 
Whitespace Bad Payloads(run #2)(11 of 15) - | query length 550020 | bad payloads 55000 | duration 55ms 
Whitespace Bad Payloads(run #2)(12 of 15) - | query length 600020 | bad payloads 60000 | duration 52ms 
Whitespace Bad Payloads(run #2)(13 of 15) - | query length 650020 | bad payloads 65000 | duration 87ms 
Whitespace Bad Payloads(run #2)(14 of 15) - | query length 700020 | bad payloads 70000 | duration 84ms 
Whitespace Bad Payloads(run #2) - finished | max time was 87 ms 
=======================

Comment Bad Payloads(run #2)(1 of 15) - | query length 75022 | bad payloads 5000 | duration 14ms 
Comment Bad Payloads(run #2)(2 of 15) - | query length 150022 | bad payloads 10000 | duration 25ms 
Comment Bad Payloads(run #2)(3 of 15) - | query length 225022 | bad payloads 15000 | duration 31ms 
Comment Bad Payloads(run #2)(4 of 15) - | query length 300022 | bad payloads 20000 | duration 34ms 
Comment Bad Payloads(run #2)(5 of 15) - | query length 375022 | bad payloads 25000 | duration 26ms 
Comment Bad Payloads(run #2)(6 of 15) - | query length 450022 | bad payloads 30000 | duration 25ms 
Comment Bad Payloads(run #2)(7 of 15) - | query length 525022 | bad payloads 35000 | duration 32ms 
Comment Bad Payloads(run #2)(8 of 15) - | query length 600022 | bad payloads 40000 | duration 37ms 
Comment Bad Payloads(run #2)(9 of 15) - | query length 675022 | bad payloads 45000 | duration 48ms 
Comment Bad Payloads(run #2)(10 of 15) - | query length 750022 | bad payloads 50000 | duration 51ms 
Comment Bad Payloads(run #2)(11 of 15) - | query length 825022 | bad payloads 55000 | duration 48ms 
Comment Bad Payloads(run #2)(12 of 15) - | query length 900022 | bad payloads 60000 | duration 48ms 
Comment Bad Payloads(run #2)(13 of 15) - | query length 975022 | bad payloads 65000 | duration 52ms 
Comment Bad Payloads(run #2)(14 of 15) - | query length 1050022 | bad payloads 70000 | duration 70ms 
Comment Bad Payloads(run #2) - finished | max time was 70 ms 
=======================

So it starts to creep up BUT nothing like the grammar file which is 2 orders of magnitude slower!

This is what is discovered debugging. Channel 0 (grammar tokens) get put into the parse tree and this is quite costly when you have 10s of 1000s of them. The cost is in the lexing AND in the parse tree building so it makes total sense for them to be strongly limited.

However there is 2 other channels - whitespace and line comments (# comments). Now it turns out that they are NOT anywhere near as costly. The whitespace (while lexed as single tokens per whitespace character) are accumulated (burns some CPU and memory since each becomes a Token) but they are not placed into the parse tree - but rather accumulated. As you can see above it happens relatively fast.

And then when the parser is called back - this line means we throw them away.

    private void addIgnoredChars(ParserRuleContext ctx, NodeBuilder nodeBuilder) {
        if (!parserOptions.isCaptureIgnoredChars()) {
            return;
        }

So they accumulate fast enough and then are never used.

The line comments are also accumulated, not as fast because they are actually used in the parser tree.

    protected List<Comment> getComments(ParserRuleContext ctx) {
        if (!parserOptions.isCaptureLineComments()) {
            return NO_COMMENTS;
        }

By default (even for queries) we keep the line comments. So an attack vector is to send in

# lots of comments
# lots of comments
# lots of comments
# lots of comments
query x { f }

However as the numbers above show even this is fast-ish.

My fear is around whitespace. That 15,000 whitespace characters will catch some queries out. I mean some queries might be say 512KB in size - they will likely be below 15,000 grammar tokens BUT could they be 1/3 whitespace - (170,000 whitespace chars say) - yeah why not.

So this PR as it is (counting whitespace the same as grammar tokens or line comments) will stop them sending in such a query.

I think this leads us towards having 2 max values in ParserOptions - the max tokens and a max whitespace number.

@bbakerman bbakerman changed the title WIP - Stop DOS attacks by making the lexer stop early on evil input. READY - Stop DOS attacks by making the lexer stop early on evil input. Jul 22, 2022
@bbakerman
Copy link
Member Author

After chats with Andi and Donna, we decided to

  • add a separate whitespace max token count - this is set at 200_000 whitespace tokens
  • add a separate set of default parser options for "operation" parsing versus DSL
    • We don't want comment lines captured by default on operation AST elements

Copy link
Member

@andimarek andimarek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: Small naming change

2: I think with introducing defaultOperationParserOptions we should rethink the SchemaParser.parseImpl where we overwrite the maxTokens. I think we don't need this anymore if we have two different ParserOptions.

@@ -46,6 +49,11 @@
@PublicApi
public class Parser {

@Internal
public static final int CHANNEL_COMMENTS = 2;
@Internal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets call this also whitespace channel and not ignored ones to make it consistent with the options.

* If you want to allow more, then {@link #setDefaultParserOptions(ParserOptions)} allows you to change this
* JVM wide.
*/
public static final int MAX_WHITESPACE_TOKENS = 200_000;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should name this SDL parser options or so to make clear it will be used for Schema Parsing. See also my general review comment.

@andimarek andimarek added this to the 19.0 milestone Jul 26, 2022
@yeikel
Copy link
Contributor

yeikel commented Jul 28, 2022

Do you have( or are planning to create) a CVE for this?

@bbakerman
Copy link
Member Author

Do you have( or are planning to create) a CVE for this?

We don't have concrete plans - mainly because we are unsure of the process and the work involved.

If you know more about this works and could coach us on process that would great because we aren't against the idea, we just know very little about the how of it

@act1on3
Copy link
Contributor

act1on3 commented Aug 3, 2022

UPD. CVE-2022-37734 is assigned.

Hi @bbakerman ,
I've requested a CVE with the form: https://cveform.mitre.org/

@dondonz
Copy link
Member

dondonz commented Sep 14, 2022

@act1on3 I'd like to update the CVE with all versions containing the fix: v19.0/19.1/19.2, v18.3, and v17.4.

I've never had to update a CVE before - do you have any way to edit the text from your end?

@act1on3
Copy link
Contributor

act1on3 commented Sep 14, 2022

Hi @dondonz ,
I believe an analyst from MITRE updated it already. Thanks for info. I also have no idea how to update the description :)

@yeikel
Copy link
Contributor

yeikel commented Sep 14, 2022

Considering the different packports, is the range correct? Ie: are users with 17.x versions that have the backport vulnerable?

@dondonz
Copy link
Member

dondonz commented Sep 14, 2022

Thanks @act1on3, I ended up contacting MITRE to fix it.

@yeikel Yes version v17.4 contains the backport, see the release notes https://github.com/graphql-java/graphql-java/releases/tag/v17.4

@yeikel
Copy link
Contributor

yeikel commented Sep 14, 2022

Thanks @act1on3, I ended up contacting MITRE to fix it.

@yeikel Yes version v17.4 contains the backport, see the release notes https://github.com/graphql-java/graphql-java/releases/tag/v17.4

If that's the case, then we should explicitly state that in the CVE. Otherwise, scanning systems will flag versions that are safe

@dondonz
Copy link
Member

dondonz commented Sep 15, 2022

@yeikel I agree, I'm getting automated tickets myself at work. The CVE text has been updated to reference the backported versions.

graphql-java before19.0 is vulnerable to Denial of Service. An attacker can send a malicious GraphQL query that consumes CPU resources. The fixed versions are 19.0 and later, 18.3, and 17.4.

https://www.cve.org/CVERecord?id=CVE-2022-37734

I am also speaking to Snyk to fix their recommendation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants