C++: split `cpp/overrunning-write` into two #7386

redsun82 · 2021-12-14T10:05:43Z

This splits the cpp/overruning-write into two separate queries based
off on the reason for the estimation. If the overrun is detected based
on non-trivial range analysis, the results are now marked by the new
cpp/very-likely-overruning-write high precision query. If it is based
on less precise, usually type based bounds, then it will still be marked
by cpp/overruning-write which remains at medium precision.

Closes: github/codeql-c-team#751

The text was updated successfully, but these errors were encountered:

redsun82 · 2021-12-14T10:06:51Z

I'll now create the internal request and start a DCA experiment on this, and then maybe play around with some LGTM databases.

MathiasVP

The code LGTM, I think (see my one comment). A few things need to happen before we can merge this:

We need to run a DCA on the code-scanning suite to make sure the performance on that suite is good with this new query.
We need to add a .qhelp file, which is probably just a copy/paste of the one we have for cpp/overrunning-write.
We need to run the new query on the ~140 projects we usually test on (or even better: on all of LGTM) and make sure that the new query doesn't have too many FPs.
Once that's done, we need to add the ready-for-doc-review label on this PR so that the docs team can review the .qhelp file, the metadata, and stuff like that.

cpp/ql/src/Security/CWE/CWE-120/OverrunWrite.ql

MathiasVP · 2021-12-14T10:15:24Z

I'll now create the internal request and start a DCA experiment on this, and then maybe play around with some LGTM databases.

Great!

cpp/ql/src/Security/CWE/CWE-120/OverrunWrite.qhelp

MathiasVP

Oops. Forgot to add the rest of the comments. Sorry!

cpp/ql/src/Security/CWE/CWE-120/VeryLikelyOverrunWrite.ql

cpp/change-notes/2021-12-14-overruning-write-split.md

redsun82 · 2021-12-14T15:32:57Z

On vim/vim, I'm getting a couple of false positives on the new query that boils down to to the trailing array idiom:

v = &fc->fixvar[fixvar_idx++].var;
name = v->di_key;
STRCPY(name, "self");

where v is of type

struct dictitem_S
{
    typval_T	di_tv;		// type and value of the variable
    char_u	di_flags;	// DI_FLAGS_ flags (only used for variable)
    char_u	di_key[1];	// key (actually longer!)
};
typedef struct dictitem_S dictitem_T;

and the items are malloced with a bigger size than the size of the item.

Any thoughts? Is this ok, or should we try to detect such a case somehow?

geoffw0

A few comments, otherwise LGTM.

cpp/ql/src/Security/CWE/CWE-120/OverrunWrite.c

cpp/ql/src/Security/CWE/CWE-120/VeryLikelyOverrunWrite.qhelp

cpp/ql/src/Security/CWE/CWE-120/VeryLikelyOverrunWrite.ql

cpp/config/suites/security/cwe-120

MathiasVP · 2021-12-14T15:48:42Z

Any thoughts? Is this ok, or should we try to detect such a case somehow?

It would be great if we can filter those out. We should have code for this already. Take a look at https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/commons/Buffer.qll#L15. That predicate is actually already used to filter out some results in getBufferSize (which this new query is actually using). So I'm surprised we're seeing these FPs.

redsun82 · 2021-12-14T16:12:25Z

on kraflab/dsda-doom I found a bunch of quite interesting FPs. Here's an example: j is bound between characters '0' and '9', so that j - 48 is actually bounded between 0 and 9, so that the formatting "STTNUM%.1d",j-48 actually fits in 8 bytes, but the query estimates the maximum length to 10. There are two other similar FP around that, but interestingly a bunch of other sprintf's around there on the same buffer are correctly not flagged.

redsun82 · 2021-12-14T16:14:39Z

Any thoughts? Is this ok, or should we try to detect such a case somehow?

It would be great if we can filter those out. We should have code for this already. Take a look at https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/commons/Buffer.qll#L15. That predicate is actually already used to filter out some results in getBufferSize (which this new query is actually using). So I'm surprised we're seeing these FPs.

Hmm, actually the full code snippet is

// Set l:self to "selfdict".  Use "name" to avoid a warning from
// some compiler that checks the destination size.
v = &fc->fixvar[fixvar_idx++].var;
name = v->di_key;
STRCPY(name, "self");

Might it be that the aliasing they use to avoid compiler warnings is actually stumping our own code detecting this case?

Or maybe it is because that field is not malloced in the same scope?

redsun82 · 2021-12-14T16:45:36Z

on kraflab/dsda-doom I found a bunch of quite interesting FPs. Here's an example: j is bound between characters '0' and '9', so that j - 48 is actually bounded between 0 and 9, so that the formatting "STTNUM%.1d",j-48 actually fits in 8 bytes, but the query estimates the maximum length to 10. There are two other similar FP around that, but interestingly a bunch of other sprintf's around there on the same buffer are correctly not flagged.

This indeed seems like a bug with range analysis, I've managed to reduce the false positive case to #7396 . Can you have a look @geoffw0 ?

redsun82 · 2021-12-16T10:44:50Z

Any thoughts? Is this ok, or should we try to detect such a case somehow?

It would be great if we can filter those out. We should have code for this already. Take a look at https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/commons/Buffer.qll#L15. That predicate is actually already used to filter out some results in getBufferSize (which this new query is actually using). So I'm surprised we're seeing these FPs.

Hmm, actually the full code snippet is
// Set l:self to "selfdict".  Use "name" to avoid a warning from
// some compiler that checks the destination size.
v = &fc->fixvar[fixvar_idx++].var;
name = v->di_key;
STRCPY(name, "self");
Might it be that the aliasing they use to avoid compiler warnings is actually stumping our own code detecting this case?

Or maybe it is because that field is not malloced in the same scope?

yes, I think that's it. As the copy is not done directly on v->di_key but on the alias name, then memberMayBeVarSize(_, name) does not hold.

Do you think we should enhance memberMayBeVarSize to follow aliases, or should we just accept this kind of FP?

MathiasVP · 2021-12-16T11:07:36Z

Do you think we should enhance memberMayBeVarSize to follow aliases, or should we just accept this kind of FP?

I think it's okay to accept this kind of FP for now (since it didn't stop increasing the precision of another query in #6760). But let's collect these FPs in an issue.

MathiasVP · 2021-12-17T08:02:42Z

cpp/ql/lib/semmle/code/cpp/commons/Printf.qll

@@ -359,6 +376,17 @@ private int lengthInBase10(float f) {
  result = f.log10().floor() + 1
 }

+private BufferWriteEstimationReason getEstimationReasonForIntegralExpression(Expr expr) {


I'm not sure expr is sufficiently bound in this predicate, but let's test this with DCA before we make any changes. I'd be much less nervous about this if we added a bindingset[expr] to it.

If this is a problem we might see it on performance? Should I add the tuple counting option?

Yeah, you can try adding tuple counting. Be sure to update DCA to get the latest updates to that feature.

cpp/ql/lib/semmle/code/cpp/commons/Printf.qll

intrigus-lgtm · 2021-12-20T00:18:40Z

I assume that it is not possible to do some sharing of the query help using query help inclusion?

This splits the `cpp/overruning-write` into two separate queries based off on the reason for the estimation. If the overrun is detected based on non-trivial range analysis, the results are now marked by the new `cpp/very-likely-overruning-write` high precision query. If it is based on less precise, usually type based bounds, then it will still be marked by `cpp/overruning-write` which remains at medium precision.

Rather than testing for `TypeBoundsAnalysis`, we test that the reason is not `ValueFlowAnalysis` (which is reported by the new `cpp/very-likely-overruning-write` query), so that if a client has overridden `BufferWrite::getMaxData` the `NoSpecifiedEstimateReason` is taken into account.

Also update the help of `cpp/overruning-write`, as the case shown there will actually not be flagged by that query any more.

Co-authored-by: Mathias Vorreiter Pedersen <mathiasvp@github.com>

This also restrict what we consider "non-trivial" range analysis, as we now require both ends to be non-trivially bounded for signed integers. This avoids false positives stemming from a non trivial upper bound but no meaningful lower bound, for example.

redsun82 · 2021-12-21T08:56:16Z

I assume that it is not possible to do some sharing of the query help using query help inclusion?

I did not know about this mechanism, I'll have a look, thanks!

There were some false positives where something like int x; // ... sprintf(buff, "%ld", (long)x); was considered as if the parameter had a non-trivial range analysis only because the range of `int` is smaller than the range for `long`, without any non-trivial range analysis actually done on `x`. These will now be reported by `OverrunWrite` instead.

Also added the relevant CERT C _and_ C++ standard references where they were missing, and did some minor stylistic tweaks to `OverrunWriteFloat.qhelp`.

MathiasVP · 2021-12-21T15:59:50Z

The DCA experiment failed due to a memory issue on the join-order badness analysis (that's my fault, sorry!)

It's still not fixed, but you can still generate the data locally by doing:

./dca reports --name redsun82/pr-7386-cc8a646__Differences__code-scanning --redo-report readme

when I do that, I get the following table (I've removed a few of the columns for simplicity):

| source                  | predicate                                                                           | b\_result\_size | b\_max\_tuple\_count | b\_metric
| nlohmann\_\_json        | Printf::FormatLiteral::getMaxConvertedLengthAfterLimited\#ffff \(recursive delta\)  | 158             | 225                  | 1.424
| microsoft\_\_calculator | Printf::FormatLiteral::getMaxConvertedLengthAfterLimited\#ffff \(recursive delta\)  | 4               | 6                    | 1.5  
| vim\_\_vim              | m\#Declaration::Declaration::hasGlobalOrStdName\_dispred\#bf                        | 8901            | 14635                | 1.644
| vim\_\_vim              | Printf::FormatLiteral::getMaxConvertedLengthAfterLimited\#ffff \(recursive delta\)  | 1978            | 3711                 | 1.876
| microsoft\_\_ChakraCore | Printf::FormatLiteral::getMaxConvertedLengthAfterLimited\#ffff \(recursive delta\)  | 3323            | 6561                 | 1.974
| dsp-testing\_\_putty    | Printf::FormatLiteral::getMaxConvertedLengthAfterLimited\#ffff \(recursive delta\)  | 1958            | 4341                 | 2.217
| git\_\_git              | Printf::FormatLiteral::getMaxConvertedLengthAfterLimited\#ffff \(recursive delta\)  | 5984            | 14886                | 2.488
| microsoft\_\_calculator | Printf::FormatLiteral::getIntegralDisplayType\_dispred\#bff                         | 6               | 54                   | 3    
| git\_\_git              | Scanf::ScanfFormatLiteral::parseConvSpec\_dispred\#ffffff                           | 2               | 8                    | 4    
| vim\_\_vim              | Scanf::ScanfFormatLiteral::parseConvSpec\_dispred\#ffffff                           | 16              | 64                   | 4    
| dsp-testing\_\_putty    | Scanf::ScanfFormatLiteral::parseConvSpec\_dispred\#ffffff                           | 22              | 88                   | 4    
| vim\_\_vim              | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 16058           | 74460                | 4.637
| git\_\_git              | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 27983           | 134538               | 4.808
| facebook\_\_yoga        | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 241             | 1302                 | 5.402
| abseil\_\_abseil-cpp    | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 1855            | 10062                | 5.424
| dsp-testing\_\_putty    | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 9202            | 50688                | 5.508
| microsoft\_\_ChakraCore | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 23079           | 134778               | 5.84 
| microsoft\_\_calculator | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 1849            | 10944                | 5.919
| nlohmann\_\_json        | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 31419           | 186444               | 5.934
| an-tao\_\_drogon        | StringAnalysis::AnalysedString::getMaxLength\#ff \(recursive delta\)                | 8410            | 50064                | 5.953
| nlohmann\_\_json        | Printf::FormatLiteral::getConvSpec\_dispred\#fff\#shared\#1                         | 83              | 581                  | 7    
| dsp-testing\_\_putty    | Printf::FormatLiteral::getConvSpec\_dispred\#fff\#shared\#1                         | 1455            | 10185                | 7    
| microsoft\_\_ChakraCore | Printf::FormatLiteral::getConvSpec\_dispred\#fff\#shared\#1                         | 2335            | 16345                | 7    
| git\_\_git              | Printf::FormatLiteral::getConvSpec\_dispred\#fff\#shared\#1                         | 5035            | 35245                | 7

I think this shows that there might be a potential performance problem that we need to look into before we merge this.

skedwards88 · 2021-12-27T17:49:23Z

Docs first responder here 👋 I've added this to our review board for a writer to review.

skedwards88

Looks good from a writer perspective!

skedwards88 · 2021-12-28T18:56:48Z

cpp/ql/src/Security/CWE/CWE-120/OverrunWrite.qhelp

-  <li>Control the size of the buffer using a preprocessor define.</li>
-  <li>Replace the call to <code>strcpy</code> with <code>strncpy</code>, specifying the define as the maximum length to copy.  This will prevent the buffer overflow.</li>
-  <li>Consider increasing the buffer size, say to 20 characters, so that the message is displayed correctly.</li>
+  <li>Control the size of the buffer by declaring it with a compile time constant</li>


nit to be consistent with the other list items

Suggested change

<li>Control the size of the buffer by declaring it with a compile time constant</li>

<li>Control the size of the buffer by declaring it with a compile time constant.</li>

skedwards88 · 2021-12-28T18:56:48Z

cpp/ql/src/Security/CWE/CWE-120/VeryLikelyOverrunWrite.qhelp

+  "qhelp.dtd">
+<qhelp>
+<overview>
+<p>The program performs a buffer copy or write operation with no upper limit on the size of the copy, and by analysing the bounds of the expressions involved it appears that certain inputs will cause a buffer overflow to occur in this case.  In addition to causing program instability, techniques exist which may allow an attacker to use this vulnerability to execute arbitrary code.</p>


nit

Suggested change

<p>The program performs a buffer copy or write operation with no upper limit on the size of the copy, and by analysing the bounds of the expressions involved it appears that certain inputs will cause a buffer overflow to occur in this case. In addition to causing program instability, techniques exist which may allow an attacker to use this vulnerability to execute arbitrary code.</p>

<p>The program performs a buffer copy or write operation with no upper limit on the size of the copy. By analyzing the bounds of the expressions involved, it appears that certain inputs will cause a buffer overflow to occur in this case. In addition to causing program instability, techniques exist which may allow an attacker to use this vulnerability to execute arbitrary code.</p>

skedwards88 · 2021-12-28T18:56:48Z

cpp/ql/src/Security/CWE/CWE-120/VeryLikelyOverrunWrite.qhelp

+
+<p>To fix this issue these changes should be made:</p>
+<ul>
+  <li>Control the size of the buffer by declaring it with a compile time constant</li>


Suggested change

<li>Control the size of the buffer by declaring it with a compile time constant</li>

<li>Control the size of the buffer by declaring it with a compile time constant.</li>

redsun82 requested review from MathiasVP and geoffw0 Dec 14, 2021

redsun82 requested a review from as a code owner Dec 14, 2021

github-actions bot added the C++ label Dec 14, 2021

github-actions bot added the documentation label Dec 14, 2021

redsun82 force-pushed the redsun82/cpp-overrunning-write-precision-split branch from 3e8ac4d to 3c01605 Dec 14, 2021

MathiasVP reviewed Dec 14, 2021

View changes

cpp/ql/src/Security/CWE/CWE-120/OverrunWrite.ql Outdated Show resolved Hide resolved

redsun82 added the depends on internal PR label Dec 14, 2021

redsun82 self-assigned this Dec 14, 2021

redsun82 force-pushed the redsun82/cpp-overrunning-write-precision-split branch from 125385b to a7ab402 Dec 14, 2021

MathiasVP reviewed Dec 14, 2021

View changes

cpp/ql/src/Security/CWE/CWE-120/OverrunWrite.qhelp Outdated Show resolved Hide resolved

MathiasVP reviewed Dec 14, 2021

View changes

cpp/ql/src/Security/CWE/CWE-120/VeryLikelyOverrunWrite.ql Outdated Show resolved Hide resolved

cpp/change-notes/2021-12-14-overruning-write-split.md Outdated Show resolved Hide resolved

redsun82 force-pushed the redsun82/cpp-overrunning-write-precision-split branch from 7bc26a6 to fa63b76 Dec 14, 2021

geoffw0 reviewed Dec 14, 2021

View changes

redsun82 closed this Dec 16, 2021

redsun82 reopened this Dec 16, 2021

MathiasVP reviewed Dec 17, 2021

View changes

redsun82 force-pushed the redsun82/cpp-overrunning-write-precision-split branch from 9ac2177 to cc8a646 Dec 17, 2021

redsun82 added 2 commits Dec 21, 2021

C++: add change note for new overrun write query

37933c8

redsun82 and others added 12 commits Dec 21, 2021

C++: add cpp/very-likely-overruning-write help

6dc0951

Also update the help of `cpp/overruning-write`, as the case shown there will actually not be flagged by that query any more.

C++: add VeryLikelyOverrunWrite.qhelp

f99f4c7

C++: add VeryLikelyOverrunWrite.ql to cwe-120

adc790c

C++: NoSpecifiedEstimateReason→Unspecified...

bfedff8

C++: tweak overrunning write qhelp files

e70cec3

C++: shorten VeryLikelyOverrunWrite @name

f6d695b

C++: update 2021-12-14-overruning-write-split.md

f268eaa

Co-authored-by: Mathias Vorreiter Pedersen <mathiasvp@github.com>

C++: doc fixes to VeryLikelyOverrunWrite

42781ee

C++: remove unused variables and fix tests

5584227

C++: auto format Printf.qll

4beaf2e

redsun82 force-pushed the redsun82/cpp-overrunning-write-precision-split branch from 1535efc to 4beaf2e Dec 21, 2021

redsun82 added 2 commits Dec 21, 2021

C++: use includes in OverrunWrite qhelp files

8662f31

Also added the relevant CERT C _and_ C++ standard references where they were missing, and did some minor stylistic tweaks to `OverrunWriteFloat.qhelp`.

redsun82 requested a review from MathiasVP Dec 21, 2021

redsun82 added the ready-for-doc-review label Dec 21, 2021

skedwards88 approved these changes Dec 28, 2021

View changes

Nov	DEC	Jan
	29
2020	2021	2022

github / codeql Public

C++: split `cpp/overrunning-write` into two #7386

C++: split `cpp/overrunning-write` into two #7386

redsun82 commented Dec 14, 2021 •

edited

redsun82 commented Dec 14, 2021

MathiasVP left a comment •

edited

MathiasVP commented Dec 14, 2021

MathiasVP left a comment

redsun82 commented Dec 14, 2021

geoffw0 left a comment

MathiasVP commented Dec 14, 2021

redsun82 commented Dec 14, 2021

redsun82 commented Dec 14, 2021 •

edited

redsun82 commented Dec 14, 2021

redsun82 commented Dec 16, 2021

MathiasVP commented Dec 16, 2021 •

edited

MathiasVP Dec 17, 2021

redsun82 Dec 17, 2021

MathiasVP Dec 17, 2021

intrigus-lgtm commented Dec 20, 2021

redsun82 commented Dec 21, 2021

MathiasVP commented Dec 21, 2021 •

edited

skedwards88 commented Dec 27, 2021

skedwards88 left a comment

skedwards88 Dec 28, 2021

skedwards88 Dec 28, 2021

skedwards88 Dec 28, 2021

	<li>Control the size of the buffer by declaring it with a compile time constant</li>
	<li>Control the size of the buffer by declaring it with a compile time constant.</li>

	<p>The program performs a buffer copy or write operation with no upper limit on the size of the copy, and by analysing the bounds of the expressions involved it appears that certain inputs will cause a buffer overflow to occur in this case. In addition to causing program instability, techniques exist which may allow an attacker to use this vulnerability to execute arbitrary code.</p>
	<p>The program performs a buffer copy or write operation with no upper limit on the size of the copy. By analyzing the bounds of the expressions involved, it appears that certain inputs will cause a buffer overflow to occur in this case. In addition to causing program instability, techniques exist which may allow an attacker to use this vulnerability to execute arbitrary code.</p>

github / codeql Public

C++: split cpp/overrunning-write into two #7386

Are you sure you want to change the base?

C++: split cpp/overrunning-write into two #7386

Conversation

redsun82 commented Dec 14, 2021 • edited

redsun82 commented Dec 14, 2021

MathiasVP left a comment • edited

MathiasVP commented Dec 14, 2021

MathiasVP left a comment

redsun82 commented Dec 14, 2021

geoffw0 left a comment

MathiasVP commented Dec 14, 2021

redsun82 commented Dec 14, 2021

redsun82 commented Dec 14, 2021 • edited

redsun82 commented Dec 14, 2021

redsun82 commented Dec 16, 2021

MathiasVP commented Dec 16, 2021 • edited

MathiasVP Dec 17, 2021

Choose a reason for hiding this comment

redsun82 Dec 17, 2021

Choose a reason for hiding this comment

MathiasVP Dec 17, 2021

Choose a reason for hiding this comment

intrigus-lgtm commented Dec 20, 2021

redsun82 commented Dec 21, 2021

MathiasVP commented Dec 21, 2021 • edited

skedwards88 commented Dec 27, 2021

skedwards88 left a comment

skedwards88 Dec 28, 2021

Choose a reason for hiding this comment

skedwards88 Dec 28, 2021

Choose a reason for hiding this comment

skedwards88 Dec 28, 2021

Choose a reason for hiding this comment

C++: split `cpp/overrunning-write` into two #7386

C++: split `cpp/overrunning-write` into two #7386

redsun82 commented Dec 14, 2021 •

edited

MathiasVP left a comment •

edited

redsun82 commented Dec 14, 2021 •

edited

MathiasVP commented Dec 16, 2021 •

edited

MathiasVP commented Dec 21, 2021 •

edited