-
Notifications
You must be signed in to change notification settings - Fork 1.9k
JS/Python/Ruby: Document how API graphs should be interpreted #8606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
yoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent description!
yoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
calumgrant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to get more clarity on this! I'll let an engineer give a final approval.
| * | ||
| * Because the implementation of the external library is not visible, it is not known exactly what operations | ||
| * it will perform on values that flow there. Instead, the edges starting from a def-node are operations that would | ||
| * lead to an observable effect within the current codebase; without knowing for certain if the library will actually perform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand "observable effect within the current codebase". Could you mean "represent a potential data flow"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't understand what "represent a potential data flow" is supposed to mean, and I can't find a different way to express it.
Was the intuition not clear from the example below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, in terms of dataflow, observable effects are just values flowing into the codebase from the library. However, from a security point of view, effects could be other things: They called my function, so I know we just sent an email.
| * A callback is passed to the external function `foo`. We can't know if `foo` will actually invoke this callback. | ||
| * But _if_ the library should decide to invoke the callback, then a value will flow into the current codebase via the `x` parameter. | ||
| * For that reason, an edge is generated representing the argument-passing operation that might be performed by `foo`. | ||
| * This edge is going from the def-node associated with the callback to the use-node associated with the parameter `x`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe clarify with "of the lambda" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally don't use the word "lambda" when talking about functions in the JS documentation. But there's only one parameter in the example, so I'm not sure why it needs clarification.
| * on the client side is actually the return-value of the getter. | ||
| * | ||
| * Although one may think of API graphs as a tool to find certain program elements in the codebase, | ||
| * it can lead to some situations where intuition does not match what works best in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last sentence might be worth clarifying. Is there a specific gotcha you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * it can lead to some situations where intuition does not match what works best in practice. | |
| * it can lead to situations, as in the above example, where intuition does not match what works best in practice. |
☝️ Does that work better?
|
I've followed up with renaming of member predicates as discussed internally:
One desirable aspect of this is that it's always clear which is used to define a source or a sink, override predicate isSink(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getASink()
}AlternativesSome alternatives that have been brought up, so I'll just try and go over what they look like
Perhaps this is just me, but I find it ambiguous which of these represent an input to the current codebase, or an input to the function/parameter being modelled. The same is sort of true for isSinkLooking at what an override predicate isSink(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getARhs() // old naming
}
override predicate isSink(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getASink()
}
override predicate isSink(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getAnInput()
}
override predicate isSink(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getADestination()
}isSourceLooking at some examples of override predicate isSource(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getParameter(0).getAnImmediateUse() // old naming
or
node = API::moduleImport("foo").getReturn().getAnImmediateUse()
}
override predicate isSource(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getParameter(0).getASource()
or
node = API::moduleImport("foo").getReturn().getASource()
}
override predicate isSource(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getParameter(0).getAnOutput()
or
node = API::moduleImport("foo").getReturn().getAnOutput()
}
override predicate isSource(DataFlow::Node node) {
node = API::moduleImport("foo").getParameter(0).getParameter(0).getAnOrigin()
or
node = API::moduleImport("foo").getReturn().getAnOrigin()
}Matched codeLastly, looking at some code snippets matched by this: const foo = require('foo')
foo(); // API::moduleImport("foo").getReturn().getAnImmediateUse()
foo(); // API::moduleImport("foo").getReturn().getASource()
foo(); // API::moduleImport("foo").getReturn().getAnOutput()
foo(); // API::moduleImport("foo").getReturn().getAnOrigin()
foo(x); // x = API::moduleImport("foo").getParameter(0).getARhs()
foo(x); // x = API::moduleImport("foo").getParameter(0).getASink()
foo(x); // x = API::moduleImport("foo").getParameter(0).getAnInput()
foo(x); // x = API::moduleImport("foo").getParameter(0).getADestination()Conclusion (or lack thereof)Looking at the above, I find it hard to point out a clear winner here. I think source/sink has a lower chance of getting mixed up, but feel free to discuss. Looking at the commit history may also give an indication of whether the source/sink naming "feels right" at the use site. |
...rimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointFeatures.qll
Outdated
Show resolved
Hide resolved
erik-krogh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
A very optional rewriting of getAValueReachableFromSource.
|
Rebased to resolve conflicts, and fixed a comment. |
| * 3. Map the resulting API graph nodes to data-flow nodes, using `getASource` or `getASink`. | ||
| * | ||
| * For example, a simplified way to get arguments to `underscore.extend` would be | ||
| * ```codeql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```ql
asgerf
mentioned this pull request
...rimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointFeatures.qll
Fixed
Show fixed
Hide fixed
...rimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointFeatures.qll
Fixed
Show fixed
Hide fixed
...rimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointFeatures.qll
Fixed
Show fixed
Hide fixed
...rimental/adaptivethreatmodeling/lib/experimental/adaptivethreatmodeling/EndpointFeatures.qll
Fixed
Show fixed
Hide fixed
|
You're still missing some renamings in EndpointFeatures.qll. |
Co-authored-by: yoff <lerchedahl@gmail.com>
Co-authored-by: Calum Grant <42069085+calumgrant@users.noreply.github.com>
Co-authored-by: Nick Rolfe <nickrolfe@github.com>
Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>
This was referenced


I've written what I find to be a useful interpretation of API graphs.
I'm hoping we can use this PR to discuss and align on what is the "correct" way to interpret API graphs. Much of the text is language-agnostic so I'm hoping to port it to Python and Ruby as well, after we've iterated on it a bit.
I should mentioned that the example with the getter at the end doesn't work in the current JS implementation. Support for getters is in flight here -- should probably be merged first, so at not to make incorrect clams in the documentation.
I haven't renamed anything in this PR. Let's settle on the text before going through with renaming.