Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] querying was stuck on Call.getArgument without detailed log #16068

Open
iiins0mn1a opened this issue Mar 27, 2024 · 5 comments
Open

[C++] querying was stuck on Call.getArgument without detailed log #16068

iiins0mn1a opened this issue Mar 27, 2024 · 5 comments
Labels
C++ question Further information is requested

Comments

@iiins0mn1a
Copy link

related log:

[2024-03-26 13:08:51] (664s)  >>> Created relation gadgets#0b9c9d51::getParaPointerIndex#1#ff/2@0e72064q with 5120 rows and digest 8c17e92ufpma1sptlsm3ibgk848.
[2024-03-26 13:08:51] (664s) No need to promote strings for predicate gadgets#0b9c9d51::getParaPointerIndex#1#ff  as it does not contain computed strings.
[2024-03-26 13:08:51] (664s)  >>> Created relation gadgets#0b9c9d51::getParaPointerIndex#1#ff/2@31944318 with 5120 rows and digest 8c17e92ufpma1sptlsm3ibgk848.
[2024-03-26 13:08:51] (664s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@77f45a6s
[2024-03-26 13:08:51] (664s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@6366f098
[2024-03-26 13:08:56] (669s) Tuple counts for _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@6366f098 after 5s:
                      4234450 ~3%     {2} r1 = SCAN __Call#39248e3c::FunctionCall::getTarget#0#dispred#ff_10#join_rhs_Enclosing#c50c5fbf::stmtEnclosingE__#shared OUTPUT In.0 'arg1', In.1 'arg0'
                      4234450 ~3%     {2} r2 = STREAM DEDUP r1
                      9083004 ~0%     {3} r3 = JOIN r2 WITH Call#39248e3c::Call::getArgument#1#dispred#fff ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1', Rhs.1 'arg2'
                                      return r3
[2024-03-26 13:08:56] (669s) Tuple counts for _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@77f45a6s after 5s:
                      4214836 ~3%     {2} r1 = SCAN __Call#39248e3c::FunctionCall::getTarget#0#dispred#ff_10#join_rhs_Enclosing#c50c5fbf::stmtEnclosingE__#shared OUTPUT In.0 'arg1', In.1 'arg0'
                      4214836 ~3%     {2} r2 = STREAM DEDUP r1
                      9045526 ~0%     {3} r3 = JOIN r2 WITH Call#39248e3c::Call::getArgument#1#dispred#fff ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1', Rhs.1 'arg2'
                                      return r3
[2024-03-26 13:08:56] (669s) Pausing evaluation to evict 1.20GiB ARRAYS at sequence stamp o+5440836
[2024-03-26 13:08:56] (669s) Unpausing evaluation: 1.23GiB forgotten: 1.23GiB UNREACHABLE (1989 items up to o+5440829)
[2024-03-26 13:08:56] (669s)  >>> Created relation _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@6366f098 with 9083004 rows and digest 32582d05tbfpmf64m28a66ehuh0.
[2024-03-26 13:08:56] (669s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff__Call#39248e3c::Call::getArgument#1#dispred#fff___Ca__#join_rhs/2@f5edcbe0
[2024-03-26 13:08:56] (669s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff__Call#39248e3c::Call::getArgument#1#dispred#fff___Ca__#join_rhs#1/2@96b050l9
[2024-03-26 13:08:56] (669s)  >>> Created relation _Call#39248e3c::Call::getArgument#1#dispred#fff___Call#39248e3c::FunctionCall::getTarget#0#dispred#f__#shared/3@77f45a6s with 9045526 rows and digest 328843tueune55pdvlb29cmkcc8.
[2024-03-26 13:08:56] (669s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff__Call#39248e3c::Call::getArgument#1#dispred#fff___Ca__#join_rhs/2@173330kq
[2024-03-26 13:08:57] (669s) Starting to evaluate predicate _Call#39248e3c::Call::getArgument#1#dispred#fff__Call#39248e3c::Call::getArgument#1#dispred#fff___Ca__#join_rhs#1/2@67ea54jp

My query has been running far far more than 669s, but no further output log, it's confusing for me to debug.

Related query:

        exists(
            ReturnStmt ret, Expr retexpr, Function func, Expr argexpr, int paraindex |
            func = getFunctionDefinition(fc.getTarget()) and 
            ret.getEnclosingFunction() = func and 
            retexpr = ret.getExpr() and
            exists(fc.getArgument(paraindex)) |
            (
                if isFromParaPointer(ret) // local taint 
                then (
                    paraindex = getParaPointerIndex(ret) and 
                    argexpr = fc.getArgument(paraindex) and
                    result = isTarget(argexpr, res, depth) 
                    )
                else result = isTarget(retexpr, res, depth - 1)
            )
        )

These LoCs are for checking whether a FunctionCall's returned Expr is from (local taint) its arguments, and determine what's next on recursive back-tracing isTarget().

I'm using a out-dated version of CodeQL CLI, maybe I'll update my tool chains first. But still hope to be helped~

@iiins0mn1a iiins0mn1a added the question Further information is requested label Mar 27, 2024
@ginsbach
Copy link
Contributor

Thank you for reaching out with this performance issue. Can you please share the entire log file with us (the one you already posted a snippet of)?

In general, here are some guidelines for optimising CodeQL queries that the team has written up in the CodeQL documentation:

@iiins0mn1a
Copy link
Author

iiins0mn1a commented Mar 29, 2024

Hi @ginsbach , thanks for your replying, and I've invited you in my private repo to check the entire log file. And thanks for your reference. Hope to your reply. Thank you.

@iiins0mn1a
Copy link
Author

Hi @ginsbach , thanks for your replying, and I've invited you in my private repo to check the entire log file. And thanks for your reference.

Besides, I've updated my toolchain to codeql-cli-v2.16.6 (ql-lib on tag v2.16.6 too). While same query works fine with VS Code extension, it reports a lot of ERRORs when I use CLI directly. These ERRORs seem to be internel errors, related log:

[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to FlowState, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:408,46-55)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to FlowState, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:414,47-56)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to FlowState, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:426,19-28)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to FlowState, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:426,49-58)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to isAdditionalFlowStep, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:442,7-27)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to isAdditionalFlowStep, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:443,7-27)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to isAdditionalFlowStep, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:444,7-27)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to isAdditionalFlowStep, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:445,7-27)
[2024-03-29 02:10:20] [ERROR] execute queries> ERROR: Predicate signature default may not refer to DataFlowCall, which is another member of the module signature. (/home/insomnia/codeql-new/query/Next-Starter/ql/shared/dataflow/codeql/dataflow/DataFlow.qll:80,47-59)

The ERROR related entire log has also been uploaded into the repo I've invited you in. Hope to your reply. Thank you

I will launch another issue for this question

@adityasharad
Copy link
Collaborator

Could you tell us more about the pattern you are trying to detect with this query? Rather than writing the logic yourself for matching function return values to function call expressions, the CodeQL dataflow library for C/C++ may be able to handle this for you already. I don't know your definition of isFromParaPointer/getParaPointerIndex, but that sounds like it would be your definition of a tainted source, and you are looking for flow from such a source to a function call expression, or some other downstream sink.

@iiins0mn1a
Copy link
Author

Hi @adityasharad , thanks for your reply and sorry for my delay.

As you can see in my posted query, it's a part of recursive backtracing isTarget(), and in this specific part, we are dealing with FunctionCall fc to check the source of its return value.

And in some cases, return value Expr retexpr may be from Parameters of this Function func. As you pointed out, we use a local taint procedure provided in library to check whether the ReturnStmt ret is from a parameter. If so, we perform further recursive procedure on the argument Expr argexpr of the FunctionCall fc, otherwise we perform recursive procedure on the retexpr.

Feel free to contact my if anything is unclaer. Thanks again.

@sidshank sidshank added the C++ label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants