New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MVP script to sync github advisory data in #392
MVP script to sync github advisory data in #392
Conversation
@rschultheis sorry for the delay! This looks fantastic. So thankful for all the hard work here. Very interested in this. |
"cve" => cve_id[4..20], | ||
"date" => published_day, | ||
"url" => external_reference, | ||
"title" => github_advisory_graphql_object["summary"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any data that would allow us to have a better title for the issues besides just "SEVERITY_LEVEL severity vulnerability that affects GEM"? Like is there any CWE or similar data that could be used to customize the generated titles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, another limitation of how the GitHub api currently works. The summary
field is a bit aspirational in the GitHub Advisory object right now, and it is literally that template you define. For CVEs there is no "summary" field in CVEs, just description, so only way to do this is to write a better one. Or take the first sentence of the CVE description perhaps.
end | ||
|
||
def external_reference | ||
github_advisory_graphql_object["references"].first["url"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always NVD? Wish we could use something else... Would prefer the url
not always just be a link back to NVD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the GitHub API currently only provides a single reference unfortunately. Its a big limitation in this case for sure. For CVEs the GitHub primary reference is the nvd link. When GitHub ultimately does publish a maintainer security advisory for a rubygem, the link though will be to the GitHub advisory(NPM example).
We could hand curate in some better references for sure. Option B would be to grab the CVE data and pull the reference list from that. It seems worth updating the docs at least to mention updating the references to include best/most important references not just what was in the GitHub API.
# and therefore should not be committed as is | ||
# We can not directly translate from GitHub to rubysec advisory format | ||
# | ||
# The patched_versions field is not exactly available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide some examples of what you see here? I wonder if we could guess a bit, knowing that it still needs human review but might make the process easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is definitely possible to make some logic to automate the most common cases of translation.
Some common cases:
{
"package": {
"name": "chloride"
},
"vulnerableVersionRange": "< 0.3.0",
"firstPatchedVersion": {
"identifier": "0.3.0"
}
},
{
"package": {
"name": "actionview"
},
"vulnerableVersionRange": ">= 4.0.0, < 4.2.11.1",
"firstPatchedVersion": {
"identifier": "4.2.11.1"
}
},
{
"package": {
"name": "actionview"
},
"vulnerableVersionRange": ">= 5.2.0, < 5.2.2.1",
"firstPatchedVersion": {
"identifier": "5.2.2.1"
}
},
{
"package": {
"name": "actionview"
},
"vulnerableVersionRange": ">= 4.0.0, < 4.2.11.1",
"firstPatchedVersion": {
"identifier": "4.2.11.1"
}
},
{
"package": {
"name": "actionview"
},
"vulnerableVersionRange": ">= 5.2.0, < 5.2.2.1",
"firstPatchedVersion": {
"identifier": "5.2.2.1"
}
},
I think my preference would be to follow this logic up with another PR. We might need to add some tests for the smart logic like that and this isn't really structured right now for that
# There are many old CVEs in the GitHub advisory dataset that are not in here | ||
# It is more important to sync the newer ones, so this allows the user to | ||
# control how old of CVEs the sync should pull over | ||
def self.sync(min_year: 2018) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many does the script output for 2018 onward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2018 onward, it produces this output:
Sync completed
Wrote these files:
---
- gems/rubygems-update/CVE-2019-8325.yml
- gems/rubygems-update/CVE-2019-8320.yml
- gems/rubygems-update/CVE-2019-8324.yml
- gems/rubygems-update/CVE-2019-8321.yml
- gems/rubygems-update/CVE-2019-8322.yml
- gems/rubygems-update/CVE-2019-8323.yml
- gems/ruby-openid/CVE-2019-11027.yml
- gems/chloride/CVE-2018-6517.yml
- gems/doorkeeper-openid_connect/CVE-2019-9837.yml
- gems/fat_free_crm/CVE-2018-1000842.yml
- gems/smart_proxy_dynflow/CVE-2018-14643.yml
- gems/jekyll/CVE-2018-17567.yml
- gems/bootstrap/CVE-2018-14042.yml
The six for rubygems are actually already in this repo over at https://github.com/rubysec/ruby-advisory-db/tree/master/libraries/rubygems
so it is syncing over 13 CVEs, but only 7 are legit additions I think.
If we look at at all time, the stats work out this:
8 2011
11 2012
15 2013
11 2014
1 2015
1 2017
5 2018
8 2019
All that is simply CVEs, which is all that is currently published by GitHub. The real gain is the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @reedloden thanks for maintaining RubySec.
So full disclosure I work at GitHub on the Advisory dataset (with @mveytsman and @phillmv 👋 ).
So right now the only rubygem advisories in GitHub's api are CVE based. But the new maintainer security advisories should change that soon. With that feature a gem maintainer can report an advisory to GitHub, and if it meets GitHub's requirements it gets published, and then this script will sync it down. So in this way we should get advisories which are not CVE based, but rather straight from gem maintainers. Here is an example NPM advisory which has been published in this way.
Let me know if you think any other changes should happen and if you are cool with merging this!
# There are many old CVEs in the GitHub advisory dataset that are not in here | ||
# It is more important to sync the newer ones, so this allows the user to | ||
# control how old of CVEs the sync should pull over | ||
def self.sync(min_year: 2018) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2018 onward, it produces this output:
Sync completed
Wrote these files:
---
- gems/rubygems-update/CVE-2019-8325.yml
- gems/rubygems-update/CVE-2019-8320.yml
- gems/rubygems-update/CVE-2019-8324.yml
- gems/rubygems-update/CVE-2019-8321.yml
- gems/rubygems-update/CVE-2019-8322.yml
- gems/rubygems-update/CVE-2019-8323.yml
- gems/ruby-openid/CVE-2019-11027.yml
- gems/chloride/CVE-2018-6517.yml
- gems/doorkeeper-openid_connect/CVE-2019-9837.yml
- gems/fat_free_crm/CVE-2018-1000842.yml
- gems/smart_proxy_dynflow/CVE-2018-14643.yml
- gems/jekyll/CVE-2018-17567.yml
- gems/bootstrap/CVE-2018-14042.yml
The six for rubygems are actually already in this repo over at https://github.com/rubysec/ruby-advisory-db/tree/master/libraries/rubygems
so it is syncing over 13 CVEs, but only 7 are legit additions I think.
If we look at at all time, the stats work out this:
8 2011
11 2012
15 2013
11 2014
1 2015
1 2017
5 2018
8 2019
All that is simply CVEs, which is all that is currently published by GitHub. The real gain is the future...
end | ||
|
||
def external_reference | ||
github_advisory_graphql_object["references"].first["url"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the GitHub API currently only provides a single reference unfortunately. Its a big limitation in this case for sure. For CVEs the GitHub primary reference is the nvd link. When GitHub ultimately does publish a maintainer security advisory for a rubygem, the link though will be to the GitHub advisory(NPM example).
We could hand curate in some better references for sure. Option B would be to grab the CVE data and pull the reference list from that. It seems worth updating the docs at least to mention updating the references to include best/most important references not just what was in the GitHub API.
"cve" => cve_id[4..20], | ||
"date" => published_day, | ||
"url" => external_reference, | ||
"title" => github_advisory_graphql_object["summary"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, another limitation of how the GitHub api currently works. The summary
field is a bit aspirational in the GitHub Advisory object right now, and it is literally that template you define. For CVEs there is no "summary" field in CVEs, just description, so only way to do this is to write a better one. Or take the first sentence of the CVE description perhaps.
# and therefore should not be committed as is | ||
# We can not directly translate from GitHub to rubysec advisory format | ||
# | ||
# The patched_versions field is not exactly available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is definitely possible to make some logic to automate the most common cases of translation.
Some common cases:
{
"package": {
"name": "chloride"
},
"vulnerableVersionRange": "< 0.3.0",
"firstPatchedVersion": {
"identifier": "0.3.0"
}
},
{
"package": {
"name": "actionview"
},
"vulnerableVersionRange": ">= 4.0.0, < 4.2.11.1",
"firstPatchedVersion": {
"identifier": "4.2.11.1"
}
},
{
"package": {
"name": "actionview"
},
"vulnerableVersionRange": ">= 5.2.0, < 5.2.2.1",
"firstPatchedVersion": {
"identifier": "5.2.2.1"
}
},
{
"package": {
"name": "actionview"
},
"vulnerableVersionRange": ">= 4.0.0, < 4.2.11.1",
"firstPatchedVersion": {
"identifier": "4.2.11.1"
}
},
{
"package": {
"name": "actionview"
},
"vulnerableVersionRange": ">= 5.2.0, < 5.2.2.1",
"firstPatchedVersion": {
"identifier": "5.2.2.1"
}
},
I think my preference would be to follow this logic up with another PR. We might need to add some tests for the smart logic like that and this isn't really structured right now for that
Sounds good. Just landed this. Let's iterate and figure out next steps. |
This is a script which imports data from the GitHub Security Advisory API into this advisory database.
Basic overview
bundle exec rake sync_github_advisories
patched_versions
andunaffected_versions
versions manually, using comments in the file. GitHub data is structured opposite of RubySec unfortunately: GitHub identifies version range which are vulnerable, RubySec identifies version ranges which are not vulnerable. The translation from the GitHub way to the RubySec way is manual (but assisted by comments made by this script in the new yaml files it makes).Running instructions
The GitHub Advisory API requires a token to access it.
Examples
So far I've made two PRs using this script:
There are a handful more recent CVEs that we could sync over now. There are many for older years that are simply not as interesting, and potentially not worth the energy of ingesting. I am open to feedback on this.