Extend benchmark to other languages #67

elanzini · 2021-02-01T09:47:00Z

First, I think this benchmark is much needed and can bring great value.
I am personally working on vulnerabilities for Python, Java and C.

Is there is any plan to add support for other languages?

The text was updated successfully, but these errors were encountered:

esbena · 2021-02-01T11:52:31Z

Is there is any plan to add support for other languages?

Yes. I personally hope to have CVE data for an additional language this year, along with a few associated analysis drivers. But in the short term, I only have various JavaScript improvements planned.
PRs and concrete suggestions are of course welcome.

For reference: https://github.com/ossf-cve-benchmark/ossf-cve-benchmark#status-and-roadmap

In time, we'll work on the following high-level improvements:

additional CVE data, for both JavaScript and other languages

support for additional security analysis tools, for both JavaScript and other languages

elanzini · 2021-02-01T13:11:12Z

Thansk @esbena for the quick reply. For Java, the dataset by SAP published in MSR 2019 is a great starting point. For 1000+ Java-related CVEs, you have the manually labeled fixing commit. There are also statements with more metadata as well.

I have a couple more questions:

Is the postPatch commit the commit chronologically after the patchCommit?
Also, is there a reason why you decided to use { filename + line_number(s) } instead of a callable URI, to identify the specific construct that was changed?

esbena · 2021-02-01T13:28:38Z

I have added #68 for tracking the MSR2019 suggestion. Thanks! (there are a few similar sources around, but this one seems to be particularly accessible)

Is the postPatch commit the commit chronologically after the patchCommit?

No postPatch is the commit that fixes the vulnerability. Now that you ask like this, I can see how the naming can be confusing.
The intention is that prePatch and postPatch represents the state of the project wrt. the vulnerability, thus it makes sense to view the commit that contains the (merged) "patchCommit" as being postPatch.

See https://github.com/ossf-cve-benchmark/ossf-cve-benchmark/blob/main/docs/benchmark-CVEs.md#commits-related-to-a-cve

prePatch: the commit just before the vulnerability is fixed.
postPatch: the commit that fixes the vulnerability.

Also, is there a reason why you decided to use { filename + line_number(s) } instead of a callable URI, to identify the specific construct that was changed?

Yes. While the dataset is github.com centric at the momment, it supports other source hosts as well (#2). Your "callable URL" can be constructed on a per-host basis (gitlab, bitbucket, ...). Similarly , the report --kind server uses the filename + line_number to highlight the vulnerable lines.

agigleux · 2021-03-25T08:26:36Z

Just to say it, at SonarSource, we are really interested to see OpenSSF CVE Benchmark supporting Java, C#, PHP, Python and C.

I'm sure we can contribute some CVEs but before doing that, it would be great to have the glue is place so that OpenSSF CVE Benchmark can run properly the scan of compiled languages (Java, C#, C). For PHP and Python I believe its support will be as easy as for JS but for for compiled languages, generally SAST tools need to compile the code to be able to scan it (this is the case for CodeQL and SonarCloud/SonarQube) and I'm sure you will hit this problem: dependencies are not longer reachable and so it's no longer possible to compile this Java project that was having a vuln in 2015.

I would like to suggest to create one ticket per language to start to gather interest for that particular language and datasets, CVEs, ... whatever that can help to move forward.

esbena · 2021-03-25T19:54:05Z

Just to say it, at SonarSource, we are really interested to see OpenSSF CVE Benchmark supporting Java, C#, PHP, Python and C.

Great to hear. ❤️

it would be great to have the glue is place so that OpenSSF CVE Benchmark can run properly the scan of compiled languages

I think the glue to run is there already, but that is beacuse I expect individual drivers to do the heavy lifting, perhaps with some shared logic in the (optional) driver.ts utility.

To me, the general problem for compiled languages is the fundamental problem of irreproducible builds (see #125, #126, #127), and that this isn't something that is easily solved with more implementation glue, but rather additional information that is either recorded in the benchmark entries, or externally (an archive.org for builds?).

I would like to suggest to create one ticket per language ...

Please go ahead.

As a starting point, the content could perhaps a be list of analysis tools that would be easy to implement drivers for. A link to a large source of CVEs for open source projects in the relevant language would also be relevant.

Side-tracking a bit: I have a large amount of useful Java CVE data available, I can transform that to the benchmark format, and test it with the CodeQL driver. That should test the hypothesis that there already is enough glue. Would it be possible for you to easily try such Java benchmark entries on your internal SonarSource driver and report back about general issues?

agigleux · 2021-03-26T14:57:12Z

Would it be possible for you to easily try such Java benchmark entries on your internal SonarSource driver and report back about general issues?

Yes, I will be able to work on that after Easter break, no problem.

When I'm talking about the glue, I was thinking about:

have the possibility to run only the scan of CVEs corresponding to a given language
be able to view in the final report how good is a SAST tool by language

esbena · 2021-03-26T15:06:11Z

have the possibility to run only the scan of CVEs corresponding to a given language

The "does CVE X belong to language Y?" question is hard to answer. We have a simple selector that checks the extension of the mentioned files of a benchmark entry.

So to select all javascript CVEs, one could try to use the two selectors ext:js ext:jsx like this:

$ bin/cli run --tool eslint-default ext:js ext:jsx
Spawning child process: 'node <user>/ossf-cve-benchmarking/build/ts/contrib/tools/eslint/src/eslint.js /tmp/bcves-run-61C7W6/driver-inputs.json'
Preparing run of eslint-default on CVE-2016-1000229/prePatch (run 1/402).
...

be able to view in the final report how good is a SAST tool by language

A report for javascript could be generated similarly:

$ bin/cli report ... ext:js ext:jsx

But unless someone implements a smarter report, there will be no grouping by language inside a report that contains CVEs for multiple languages.

agigleux · 2021-04-06T11:58:41Z

I'm ready to allocate time to test the Java part whenever you are. Just ping me here and I'll work on it.

esbena · 2021-04-13T11:11:54Z

@agigleux I can work on the Java bits next week onwards. How does that sound?

I initially expect to import a bunch of internally triaged Java CVEs, I will add some Java drivers after that.

agigleux · 2021-04-13T11:32:30Z

@esbena All good on my side.

esbena added the question Further information is requested label Feb 1, 2021

esbena mentioned this issue Feb 1, 2021

Use SAP/project-kb MSR2019 as a source of Java CVEs #68

Open

esbena mentioned this issue Mar 25, 2021

Building projects: missing/altered remote dependendencies #127

Open

esbena added the CVE Data about CVEs label Mar 25, 2021

esbena mentioned this issue Jun 22, 2021

Add CVE data from AOSP #206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend benchmark to other languages #67

Extend benchmark to other languages #67

elanzini commented Feb 1, 2021

esbena commented Feb 1, 2021

elanzini commented Feb 1, 2021

esbena commented Feb 1, 2021

agigleux commented Mar 25, 2021

esbena commented Mar 25, 2021 •

edited

agigleux commented Mar 26, 2021

esbena commented Mar 26, 2021 •

edited

agigleux commented Apr 6, 2021

esbena commented Apr 13, 2021

agigleux commented Apr 13, 2021

Extend benchmark to other languages #67

Extend benchmark to other languages #67

Comments

elanzini commented Feb 1, 2021

esbena commented Feb 1, 2021

elanzini commented Feb 1, 2021

esbena commented Feb 1, 2021

agigleux commented Mar 25, 2021

esbena commented Mar 25, 2021 • edited

agigleux commented Mar 26, 2021

esbena commented Mar 26, 2021 • edited

agigleux commented Apr 6, 2021

esbena commented Apr 13, 2021

agigleux commented Apr 13, 2021

esbena commented Mar 25, 2021 •

edited

esbena commented Mar 26, 2021 •

edited