Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend benchmark to other languages #67

Open
elanzini opened this issue Feb 1, 2021 · 10 comments
Open

Extend benchmark to other languages #67

elanzini opened this issue Feb 1, 2021 · 10 comments
Labels
CVE Data about CVEs question Further information is requested

Comments

@elanzini
Copy link

elanzini commented Feb 1, 2021

First, I think this benchmark is much needed and can bring great value.
I am personally working on vulnerabilities for Python, Java and C.

Is there is any plan to add support for other languages?

@esbena esbena added the question Further information is requested label Feb 1, 2021
@esbena
Copy link
Contributor

esbena commented Feb 1, 2021

Is there is any plan to add support for other languages?

Yes. I personally hope to have CVE data for an additional language this year, along with a few associated analysis drivers. But in the short term, I only have various JavaScript improvements planned.
PRs and concrete suggestions are of course welcome.


For reference: https://github.com/ossf-cve-benchmark/ossf-cve-benchmark#status-and-roadmap

In time, we'll work on the following high-level improvements:

  • additional CVE data, for both JavaScript and other languages
  • support for additional security analysis tools, for both JavaScript and other languages

@elanzini
Copy link
Author

elanzini commented Feb 1, 2021

Thansk @esbena for the quick reply. For Java, the dataset by SAP published in MSR 2019 is a great starting point. For 1000+ Java-related CVEs, you have the manually labeled fixing commit. There are also statements with more metadata as well.

I have a couple more questions:

Is the postPatch commit the commit chronologically after the patchCommit?
Also, is there a reason why you decided to use { filename + line_number(s) } instead of a callable URI, to identify the specific construct that was changed?

@esbena
Copy link
Contributor

esbena commented Feb 1, 2021

I have added #68 for tracking the MSR2019 suggestion. Thanks! (there are a few similar sources around, but this one seems to be particularly accessible)

Is the postPatch commit the commit chronologically after the patchCommit?

No postPatch is the commit that fixes the vulnerability. Now that you ask like this, I can see how the naming can be confusing.
The intention is that prePatch and postPatch represents the state of the project wrt. the vulnerability, thus it makes sense to view the commit that contains the (merged) "patchCommit" as being postPatch.

See https://github.com/ossf-cve-benchmark/ossf-cve-benchmark/blob/main/docs/benchmark-CVEs.md#commits-related-to-a-cve

  • prePatch: the commit just before the vulnerability is fixed.
  • postPatch: the commit that fixes the vulnerability.

Also, is there a reason why you decided to use { filename + line_number(s) } instead of a callable URI, to identify the specific construct that was changed?

Yes. While the dataset is github.com centric at the momment, it supports other source hosts as well (#2). Your "callable URL" can be constructed on a per-host basis (gitlab, bitbucket, ...). Similarly , the report --kind server uses the filename + line_number to highlight the vulnerable lines.

@agigleux
Copy link

Just to say it, at SonarSource, we are really interested to see OpenSSF CVE Benchmark supporting Java, C#, PHP, Python and C.

I'm sure we can contribute some CVEs but before doing that, it would be great to have the glue is place so that OpenSSF CVE Benchmark can run properly the scan of compiled languages (Java, C#, C). For PHP and Python I believe its support will be as easy as for JS but for for compiled languages, generally SAST tools need to compile the code to be able to scan it (this is the case for CodeQL and SonarCloud/SonarQube) and I'm sure you will hit this problem: dependencies are not longer reachable and so it's no longer possible to compile this Java project that was having a vuln in 2015.

I would like to suggest to create one ticket per language to start to gather interest for that particular language and datasets, CVEs, ... whatever that can help to move forward.

@esbena
Copy link
Contributor

esbena commented Mar 25, 2021

Just to say it, at SonarSource, we are really interested to see OpenSSF CVE Benchmark supporting Java, C#, PHP, Python and C.

Great to hear. ❤️

it would be great to have the glue is place so that OpenSSF CVE Benchmark can run properly the scan of compiled languages

I think the glue to run is there already, but that is beacuse I expect individual drivers to do the heavy lifting, perhaps with some shared logic in the (optional) driver.ts utility.

To me, the general problem for compiled languages is the fundamental problem of irreproducible builds (see #125, #126, #127), and that this isn't something that is easily solved with more implementation glue, but rather additional information that is either recorded in the benchmark entries, or externally (an archive.org for builds?).

I would like to suggest to create one ticket per language ...

Please go ahead.

As a starting point, the content could perhaps a be list of analysis tools that would be easy to implement drivers for. A link to a large source of CVEs for open source projects in the relevant language would also be relevant.

Side-tracking a bit: I have a large amount of useful Java CVE data available, I can transform that to the benchmark format, and test it with the CodeQL driver. That should test the hypothesis that there already is enough glue. Would it be possible for you to easily try such Java benchmark entries on your internal SonarSource driver and report back about general issues?

@esbena esbena added the CVE Data about CVEs label Mar 25, 2021
@agigleux
Copy link

Would it be possible for you to easily try such Java benchmark entries on your internal SonarSource driver and report back about general issues?

Yes, I will be able to work on that after Easter break, no problem.

When I'm talking about the glue, I was thinking about:

  • have the possibility to run only the scan of CVEs corresponding to a given language
  • be able to view in the final report how good is a SAST tool by language

@esbena
Copy link
Contributor

esbena commented Mar 26, 2021

have the possibility to run only the scan of CVEs corresponding to a given language

The "does CVE X belong to language Y?" question is hard to answer. We have a simple selector that checks the extension of the mentioned files of a benchmark entry.

So to select all javascript CVEs, one could try to use the two selectors ext:js ext:jsx like this:

$ bin/cli run --tool eslint-default ext:js ext:jsx
Spawning child process: 'node <user>/ossf-cve-benchmarking/build/ts/contrib/tools/eslint/src/eslint.js /tmp/bcves-run-61C7W6/driver-inputs.json'
Preparing run of eslint-default on CVE-2016-1000229/prePatch (run 1/402).
...

be able to view in the final report how good is a SAST tool by language

A report for javascript could be generated similarly:

$ bin/cli report ... ext:js ext:jsx

But unless someone implements a smarter report, there will be no grouping by language inside a report that contains CVEs for multiple languages.

@agigleux
Copy link

agigleux commented Apr 6, 2021

I'm ready to allocate time to test the Java part whenever you are. Just ping me here and I'll work on it.

@esbena
Copy link
Contributor

esbena commented Apr 13, 2021

@agigleux I can work on the Java bits next week onwards. How does that sound?

I initially expect to import a bunch of internally triaged Java CVEs, I will add some Java drivers after that.

@agigleux
Copy link

@esbena All good on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CVE Data about CVEs question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants