New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CVE data from AOSP #206
Comments
We would love to have benchmark entries for your CVEs (this is highly related to #67, #68, by the way), and your example looks reasonably close to being useful (see comments below). Out of interest, I suppose you have some raw data that you intend to generate the benchmark entries from. Is that data publicly accessible somewhere? (secondarily: is the information restricted by a usage license?) I'll try to answer your summarized questions.
Yes, your submission would be the first though, and there are currently no tool drivers that support those languages (which may be seen as a bootstrapping problem. Does your raw data perhaps have information about analysis tools that flags or should have flagged the weaknesses?). (I am personally looking into the adding proper support for Java and Python CVEs (#125, #126) through some additional optional data in the benchmark entries, and to open a PR when I am satisfied with the design. Realistically, I will have it ready in the early fall.)
Correction: the The intention is that the benchmark entries should be self-contained: it should ideally be possible manually confirm that an analysis tool flags a weakness for the right reason without having to dig various CVE discussions and CWE descriptions. And secondarily, confirming that the proposed We could consider making I now wondering if a dedicated UI could help produce useful explanations (I'll type up an issue shortly): present the user for as much information as possible (weakness source code, patch, CVE description, CWE titles, commit messages, ...), and then have him type or copy/paste in an I suppose the raw data that your are exporting from contains the
Two thoughts: git host and correctness. I note that the source code for they example CVE is hosted as git on android.googlesource.com. As per #2, some of the tooling still only supports the github.com host (I think download rate limits would be the primary feature we would be missing). Perhaps the easiest way to add android.googlesource.com support, is to teach the implementation that github.com/aosp-mirror could be used as a mirror for that host. Is the example line information correct? I am interpreting your example weaknesses to point to the following lines:
I do not think any of those lines match the CWE-189 title "Numeric Errors", although I see some multiplications close to those lines. |
The data is indeed coming from the Android Security Bulletins. The website is under a Creative Commons Attribution 4.0 license (here) and the code under an Apache 2 license (to be taken with a grain of salt as I'm definitely not a lawyer).
My bad, I misread the definitions... My contribution would require a modification to the schema itself. Even if the field is nice to have, it does not cope well with an automated tooling as it requires manual input.
I'm actually doing exactly the opposite. I'm primary using the bulletins information, and then correlating with the CVE Circl API to retrieve missing fields (like CWE).
It is even easier for my side to replace the original link from android.googlesource.com to the mirror hosted on GitHub. As we already mentioned modifying the schema, could it be possible to store additional URLs for the
In short, no... I had a slight offset in my data that I did not check before posting. The referenced line is the beginning of the patch hunk (the context around the patch) as seen here . The actual modified line is at +3 for these snippets :
|
Oh wow, and you have the commit information for most/all of them? Do you have a ballpark estimate of how many benchmark entries you could end up adding?
I think that is an excellent way to view the problem: there's a high likelihood that all of the other data exists in structured form already, so it it a shame if that subjective I think I am close to convincing myself that it would be fine to have the data without an
I think that would make sense, the type could be generalized to something like |
A bit more than 1800 as today but we keep increasing it with new bulletins.
I see your point here. Another option would be to have a fast/complete set of CVEs for quick experiments where all information is present while a bigger dataset with less accurate data could be used for further analyses.
Looks good to me ;) |
I think we are talking about the same thing here. There's already an expressive grep-like selection mechanism for picking the CVEs of interest. Concretely, I would prefer if the following two commands behaved identically: # selects all CVEs from 2020, except for the ones that are incomplete (e.g. the CVE is disputed or the patch is insufficient)
./bin/cli list year:2020
# selects all CVEs from 2020 that has an "explanation" (NB: the has-explanation selector does not exist yet)
./bin/cli list year:2020 | ./bin/cli filter has-explanation I think we are left with the following action items before we can consume your CVEs: Next:
Later:
Does that sound reasonable? |
No. I found https://aosp-mirror.github.io/ which hints that it may be deliberate:
I would prefer if the benchmark entries only contained the official sources. All in all, it is probably best to keep it simple and simply submit a single non-mirror repository per benchmark entry (https://android.googlesource.com).
I think I would prefer 1 PR with multiple CVEs, lets see how that works out.
Yes. |
I've started the PR #212 . However, there are some issues that will prevent an "automated" import. The benchmark's model assigns vulnerabilities for lines of code, but in my case, I must extrapolate from the fix commit. This is not always easy as it may be hard to identify the precise defect from a patch, and especially which lines should be flagged. For example, the patch for CVE-2015-5310 adds a check to verify if the Protected Management Frames is enabled (I'm not familiar with this vuln, I just chose it as an example). How do we pinpoint a specific line of code which should be flagged by a static analysis tool for this CVE ? |
Oh, this is a problem. I thought the AOSP dataset had that precise information already. But I guess that the listed hunks are just small patch diffs. The location of a weakness is the hardest information to obtain in this project, and it is impossible to automate: it needs the eyes of a security analyst. For every CVE. You need to ask yourself: "On which line would I have liked an automated tool to highlight a weakness that is relevant to this CVE?" or "How could an automated tool have identified that the For CVE-2015-5310, I doubt that this is possible with a general tool. But a specialized tool could perhaps have flagged https://android.googlesource.com/platform/external/wpa_supplicant_8/+/fdb708a37d8f7f1483e3cd4e8ded974f53fedace/wpa_supplicant/wnm_sta.c#188 with the message: end = ptr + key_len_total;
wpa_hexdump_key(MSG_DEBUG, "WNM: Key Data", ptr, key_len_total);
while (ptr + 1 < end) {
/* ^^^ Use of key_len_total without associated check for `wpa_sm_pmf_enabled` */ Note that the hunk may not even be near the location of the weakness. For path-injection vulnerabilities, analysis tools typically highlight the source code line where a file is accessed, but the fix is typically to add a sanitizer many lines above the access itself. So indexing into hunks is not guaranteed to give you any useful weakness locations. At the meta-level: if we do not have any weaknesses for a benchmark entry, then there's nothing to benchmark on. The behaviour of two analysis tools can no longer be compared automatically. If we have wrong weakness locations (inferred from the hunks), then the benchmark value will degrade significantly. |
As part of my work, I've data for CVE coming from the Android Open Source Project that I'm willing to contribute as part of the OSSF Cve Benchmark.
Those CVE target compiled language (both C/C++ and Java) but I think they would still be relevant for static analysis (even before compilation) and there are issues already opened for the support of compiled version ( #125 ; #126 ...)
However, due to the number of vulnerabilities, I don't have the data for all the fields that are present in the benchmark
Below is a CVE already in the benchmark:
And here is an example of what I could contribute :
The only missing field is the "explanation" part, as it is not possible to automate it.
I checked the schema definition in the repository and it was not a required field ( here )
So to wrap up this issue :
The text was updated successfully, but these errors were encountered: