The Wayback Machine - http://web.archive.org/web/20250623002710/https://github.com/github/codeql/pull/8596
Skip to content

C++: IR data flow through global variables #8596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Aug 5, 2022

Conversation

rdmarsh2
Copy link
Contributor

@rdmarsh2 rdmarsh2 commented Mar 29, 2022

No description provided.

@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Mar 29, 2022

Opened as draft since it builds on #8515. Only 45a29cb is new at the moment.

@rdmarsh2 rdmarsh2 force-pushed the rdmarsh2/dataflow-global-vars branch from 45a29cb to 5b4a5cf Compare April 19, 2022 17:42
@rdmarsh2 rdmarsh2 force-pushed the rdmarsh2/dataflow-global-vars branch from 5b4a5cf to 680d4e8 Compare April 29, 2022 19:56
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataflow changes LGTM! Before we merge this we should probably rebase main onto this PR now that #8912 is merged, and then run a DCA experiment.

@rdmarsh2 rdmarsh2 force-pushed the rdmarsh2/dataflow-global-vars branch from d5016d9 to d28c39c Compare June 20, 2022 19:56
@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Jun 23, 2022

@geoffw0 there's some new false positives in the CWE-611 tests, and I'm not sure whether they're an issue with this PR or just exposing a known issue with the CWE-611 query. Could you take a look at those?

@rdmarsh2 rdmarsh2 marked this pull request as ready for review June 23, 2022 18:01
@rdmarsh2 rdmarsh2 requested review from a team as code owners June 23, 2022 18:01
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes in CPP security query test results only.

@MathiasVP
Copy link
Contributor

MathiasVP commented Jun 30, 2022

The performance on vim/vim looks like something we should investigate before we merge this.

@jketema jketema self-assigned this Jul 12, 2022
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two naive questions to start, just to further my understanding.

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next batch of comments. I'm slowly working my way through this. Feel free to go ahead and fixing things even though I'm not yet done (whatever your preferred way of working is).

@@ -98,6 +98,9 @@ thisArgumentIsNonPointer
| pmcallexpr.cpp:8:2:8:15 | Call: call to expression | Call instruction 'Call: call to expression' has a `this` argument operand that is not an address, in function '$@'. | array_delete.cpp:5:6:5:6 | void f() | void f() |
| pointer_to_member.cpp:23:5:23:54 | Call: call to expression | Call instruction 'Call: call to expression' has a `this` argument operand that is not an address, in function '$@'. | pointer_to_member.cpp:14:5:14:9 | int usePM(int PM::*) | int usePM(int PM::*) |
| pointer_to_member.cpp:24:5:24:49 | Call: call to expression | Call instruction 'Call: call to expression' has a `this` argument operand that is not an address, in function '$@'. | pointer_to_member.cpp:14:5:14:9 | int usePM(int PM::*) | int usePM(int PM::*) |
nonUniqueIRVariable
| misc.c:178:22:178:40 | VariableAddress: __PRETTY_FUNCTION__ | Variable address instruction 'VariableAddress: __PRETTY_FUNCTION__' has no associated variable, in function '$@'. | misc.c:177:6:177:14 | void magicvars() | void magicvars() |
| misc.c:179:27:179:34 | VariableAddress: __func__ | Variable address instruction 'VariableAddress: __func__' has no associated variable, in function '$@'. | misc.c:177:6:177:14 | void magicvars() | void magicvars() |
Copy link
Contributor

@jketema jketema Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are quite "special". Is this something we may need to file an issue for and look at later, or is this as expected?

Copy link
Contributor Author

@rdmarsh2 rdmarsh2 Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is something we need to look at eventually. The consistency check is new, and I think these don't have to do with the rest of the PR.

Copy link
Contributor

@jketema jketema Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this doesn't have anything to do with this PR.

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final batch of comments. I assumed that changes to the CWE test outputs are ok. Let me know if you want me to have a close look at those.

@jketema
Copy link
Contributor

jketema commented Jul 14, 2022

Looking at the DCA experiment: did you look whether the alert changes made sense?

@jketema
Copy link
Contributor

jketema commented Aug 1, 2022

@rdmarsh2 I think there's still the following to resolve:

@rdmarsh2 rdmarsh2 force-pushed the rdmarsh2/dataflow-global-vars branch from a8b441d to 6dbaae6 Compare August 1, 2022 18:57
@github-actions github-actions bot removed the C# label Aug 1, 2022
@jketema
Copy link
Contributor

jketema commented Aug 1, 2022

There's also still: #8596 (comment) Is this something that needs to be looked at on the extractor side?

@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Aug 1, 2022

The DCA alert changes made sense, yes. Responded to that last comment - I'm not sure if they need work on the extractor side or if they need special handling in IR generation.

@jketema
Copy link
Contributor

jketema commented Aug 1, 2022

LGTM. I've kicked off a final DCA experiment, as we changed quite a few things related to (global) variables on the extractor side since this PR was opened. I'll approve once the experiment is done (unless we find some serious problem that is new compared to the previous DCA experiment, which probably isn't very likely).

@jketema
Copy link
Contributor

jketema commented Aug 2, 2022

New DCA results are in:

  • There's a significant increase in the number of alerts (compared to the earlier DCA experiment), which makes sense given the extractor related issues we fixed and as we added/enabled systemd and Samate. The alerts look ok to me, but a second pair of eyes would be welcome.
  • The second vim run OOM'ed. This might either be related to the OOM problems we were having earlier (although those should be fixed), due to a bad join order, or simply because we now need a more powerful runner. Have the join orders been looked at?

@MathiasVP
Copy link
Contributor

MathiasVP commented Aug 2, 2022

There's a significant increase in the number of alerts (compared to the earlier DCA experiment), which makes sense given the extractor related issues we fixed and as we added/enabled systemd and Samate. The alerts look ok to me, but a second pair of eyes would be welcome.

So far:

  • All the new results on cpp/command-line-injection are new TPs involving global variables 🎉
  • There are a lot of new cpp/path-injection on Samate. I've picked an arbitrary set of results, and they're all new TPs involving global variables. So I'm gonna guess that these are all changes that we're happy with :shipit:
  • The two new results for cpp/path-injection on git/git are TPs involving writes to a global variable global_argv.
  • The 6 new results for cpp/path-injection on kamailio are also TPs involving global variables.

If anyone wants to look at the remaining results feel free to do so. I'm pretty confident in the new results.


Edit: According to people smarter than me the following might not actually be an issue.

The OOM on vim/vim looks real: here's the relevant log output from the ExecTainted query:

[2022-08-02 11:31:29] Evaluated non-recursive predicate boundedFastTC(DataFlowImpl::pathSucc#9021cc4c#ff,DataFlowImpl::Configuration::hasFlowPath#dispred#f0820431#fff#higher_order_body)@c2650acp in 2219ms (size: 1126884102485).
[2022-08-02 11:31:30] Evaluated non-recursive predicate DataFlowImpl::Configuration::hasFlowPath#dispred#f0820431#fff@88c1067n in 1777ms (size: 181).
Evaluated relational algebra for predicate DataFlowImpl::Configuration::hasFlowPath#dispred#f0820431#fff@88c1067n with tuple counts:
        11   ~0%    {2} r1 = JOIN DataFlowImpl::PathNode::getConfiguration#dispred#f0820431#ff WITH DataFlowImpl::PathNode::isSource#dispred#f0820431#f ON FIRST 1 OUTPUT Lhs.0, Lhs.1
        11   ~0%    {3} r2 = JOIN r1 WITH DataFlowImpl::PathNodeImpl::getNodeEx#dispred#f0820431#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
                
        11   ~0%    {3} r3 = JOIN r2 WITH num#DataFlowImpl::TNodeNormal#9021cc4c#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.1, Lhs.2
                
        11   ~0%    {2} r4 = JOIN r2 WITH num#DataFlowImpl::TNodeNormal#9021cc4c#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.2
  15068152   ~0%    {3} r5 = JOIN r4 WITH boundedFastTC(DataFlowImpl::pathSucc#9021cc4c#ff,DataFlowImpl::Configuration::hasFlowPath#dispred#f0820431#fff#higher_order_body) ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1
                
  15068163   ~0%    {3} r6 = r3 UNION r5
        181   ~2%    {4} r7 = JOIN r6 WITH DataFlowImpl::PathNodeSink#class#9021cc4c#ffff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.0
        181   ~0%    {3} r8 = JOIN r7 WITH num#DataFlowImpl::TNodeNormal#9021cc4c#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3
                    return r8

Yes, that's a relation of size 1.126.884.102.485 😱

@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Aug 2, 2022

Looks like the OOM is on ExecTainted.ql - all other queries finish before the error. I'll take a deeper look at it.

@jketema
Copy link
Contributor

jketema commented Aug 2, 2022

Looks like the OOM is on ExecTainted.ql - all other queries finish before the error. I'll take a deeper look at it.

You might want to have a brief sync with Mathias about it. If you can't track it down I would suggest repeating the vim part of the DCA experiment with a lower --ram option than the 14979 currently used. I can explain via our internal channels.

@jketema
Copy link
Contributor

jketema commented Aug 4, 2022

I if see correctly, the re-run of DCA on vim was without the --ram option, because my directions were slightly wrong. Sorry for this! Nevertheless, the re-run was successful, and the slowdown doesn't seem to have changed compared to the previous successful run. I propose we go ahead and merge this, and that if we see DCA OOMs again on vim, we move vim to a slightly bigger runner.

@jketema jketema merged commit ba2cee0 into github:main Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants