Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SAP/project-kb MSR2019 as a source of Java CVEs #68

Open
esbena opened this issue Feb 1, 2021 · 3 comments
Open

Use SAP/project-kb MSR2019 as a source of Java CVEs #68

esbena opened this issue Feb 1, 2021 · 3 comments
Labels
CVE Data about CVEs java

Comments

@esbena
Copy link
Contributor

esbena commented Feb 1, 2021

As suggested in #67 (comment).

(Remember to check licensing for the data set)

@agigleux
Copy link

I have a concern related to MSR2019 dataset. I'm not sure this is a good dataset to support first, for the first support of Java in ossf-cve-benchmark. While I think it's a good sources of CVEs for the Java ecosystem, at the same time I don't think it represents well what developers will introduced themselves on their code.

I studied the MSR2019 dataset in 2019 when it was released and it was made of 1282 entries:

  • 779 are coming from Apache Projects
  • 104 from https://github.com/spring-projects
  • 79 from https://github.com/jenkinsci
    So it shows vulnerabilities introduced in libraries that are used by any Java developers that's true but almost nothing about the final apps developed. There is no web app, cloud native app repos on this dataset. Being able to detect vulnerabilities in libraries or in the final code deployed in PROD are two different activistes and if a SAST tool is good at the first one, it doesn't mean it can detect vulnerabilities on the second case.

@esbena
Copy link
Contributor Author

esbena commented Mar 25, 2021

I'm not sure this is a good dataset to support first,

I am not planning to import it. But perhaps someone else will open a big PR.


Meta:

I think all CVE sources will be biased in some way, hopefully the CVE selectors can be used to select a reasonable subset of CVEs that are relevant for a given comparison. Relatedly, we already have some biasing in the form of git and github.com, see #2.
The inherent biasing, is a big reason that this repository does not attempt to publish a big and canonical report of how each analysis tool is doing.

I was not aware of the MSR2019 contents, but that does seem overly biased towards certain kinds of code. I have not done a similar analysis for the benchmark entries of this repository, but I hope they are less biased since they were simply sourced from a stream of GitHub security advisories, with reasonable effort limits on finding commits and weakness locations.

Perhaps the CVE entries should have additional meta-information about the project in question: is-library, is-web-server, is-cli-tool, ... but I fear that could easily lead to feature creep, and inconsistencies among the many CVEs: especially if new "kinds" are added later.

@aunzaa
Copy link

aunzaa commented Sep 18, 2022

ตามที่แนะนำใน#67 (ความคิดเห็น) .

(อย่าลืมตรวจสอบใบอนุญาตสำหรับชุดข้อมูล)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CVE Data about CVEs java
Projects
None yet
Development

No branches or pull requests

3 participants