Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP script to sync github advisory data in #392

Merged
merged 4 commits into from Jun 28, 2019
Merged

MVP script to sync github advisory data in #392

merged 4 commits into from Jun 28, 2019

Conversation

rschultheis
Copy link
Contributor

@rschultheis rschultheis commented Jun 14, 2019

This is a script which imports data from the GitHub Security Advisory API into this advisory database.

Basic overview

  • Person runs the script using: bundle exec rake sync_github_advisories
  • Script writes yaml advisory files for all rubygem advisories that GitHub has, which RubySec does not already have
  • Person reviews each file carefully, makes sure data looks right
  • Person has to set the patched_versions and unaffected_versions versions manually, using comments in the file. GitHub data is structured opposite of RubySec unfortunately: GitHub identifies version range which are vulnerable, RubySec identifies version ranges which are not vulnerable. The translation from the GitHub way to the RubySec way is manual (but assisted by comments made by this script in the new yaml files it makes).
  • Person decides which of those files to make PRs for and makes those PRs manually.

Running instructions

The GitHub Advisory API requires a token to access it.

export GH_API_TOKEN=<your GitHub API Token>
bundle exec rake sync_github_advisories

Examples

So far I've made two PRs using this script:

There are a handful more recent CVEs that we could sync over now. There are many for older years that are simply not as interesting, and potentially not worth the energy of ingesting. I am open to feedback on this.

@rschultheis rschultheis marked this pull request as ready for review June 18, 2019 18:19
@rschultheis
Copy link
Contributor Author

I am opening this up for review! There are now two example PRs made by this script so I think that is good enough to get the idea across. Is anyone on the rubysec squad interested in this work?

@reedloden
Copy link
Member

@rschultheis sorry for the delay!

This looks fantastic. So thankful for all the hard work here. Very interested in this.

"cve" => cve_id[4..20],
"date" => published_day,
"url" => external_reference,
"title" => github_advisory_graphql_object["summary"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any data that would allow us to have a better title for the issues besides just "SEVERITY_LEVEL severity vulnerability that affects GEM"? Like is there any CWE or similar data that could be used to customize the generated titles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, another limitation of how the GitHub api currently works. The summary field is a bit aspirational in the GitHub Advisory object right now, and it is literally that template you define. For CVEs there is no "summary" field in CVEs, just description, so only way to do this is to write a better one. Or take the first sentence of the CVE description perhaps.

end

def external_reference
github_advisory_graphql_object["references"].first["url"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always NVD? Wish we could use something else... Would prefer the url not always just be a link back to NVD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the GitHub API currently only provides a single reference unfortunately. Its a big limitation in this case for sure. For CVEs the GitHub primary reference is the nvd link. When GitHub ultimately does publish a maintainer security advisory for a rubygem, the link though will be to the GitHub advisory(NPM example).

We could hand curate in some better references for sure. Option B would be to grab the CVE data and pull the reference list from that. It seems worth updating the docs at least to mention updating the references to include best/most important references not just what was in the GitHub API.

# and therefore should not be committed as is
# We can not directly translate from GitHub to rubysec advisory format
#
# The patched_versions field is not exactly available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some examples of what you see here? I wonder if we could guess a bit, knowing that it still needs human review but might make the process easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it is definitely possible to make some logic to automate the most common cases of translation.

Some common cases:

      {
          "package": {
            "name": "chloride"
          },
          "vulnerableVersionRange": "< 0.3.0",
          "firstPatchedVersion": {
            "identifier": "0.3.0"
          }
        },
        {
          "package": {
            "name": "actionview"
          },
          "vulnerableVersionRange": ">= 4.0.0, < 4.2.11.1",
          "firstPatchedVersion": {
            "identifier": "4.2.11.1"
          }
        },
        {
          "package": {
            "name": "actionview"
          },
          "vulnerableVersionRange": ">= 5.2.0, < 5.2.2.1",
          "firstPatchedVersion": {
            "identifier": "5.2.2.1"
          }
        },
        {
          "package": {
            "name": "actionview"
          },
          "vulnerableVersionRange": ">= 4.0.0, < 4.2.11.1",
          "firstPatchedVersion": {
            "identifier": "4.2.11.1"
          }
        },
        {
          "package": {
            "name": "actionview"
          },
          "vulnerableVersionRange": ">= 5.2.0, < 5.2.2.1",
          "firstPatchedVersion": {
            "identifier": "5.2.2.1"
          }
        },

I think my preference would be to follow this logic up with another PR. We might need to add some tests for the smart logic like that and this isn't really structured right now for that

# There are many old CVEs in the GitHub advisory dataset that are not in here
# It is more important to sync the newer ones, so this allows the user to
# control how old of CVEs the sync should pull over
def self.sync(min_year: 2018)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many does the script output for 2018 onward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2018 onward, it produces this output:

Sync completed
Wrote these files:
---
- gems/rubygems-update/CVE-2019-8325.yml
- gems/rubygems-update/CVE-2019-8320.yml
- gems/rubygems-update/CVE-2019-8324.yml
- gems/rubygems-update/CVE-2019-8321.yml
- gems/rubygems-update/CVE-2019-8322.yml
- gems/rubygems-update/CVE-2019-8323.yml
- gems/ruby-openid/CVE-2019-11027.yml
- gems/chloride/CVE-2018-6517.yml
- gems/doorkeeper-openid_connect/CVE-2019-9837.yml
- gems/fat_free_crm/CVE-2018-1000842.yml
- gems/smart_proxy_dynflow/CVE-2018-14643.yml
- gems/jekyll/CVE-2018-17567.yml
- gems/bootstrap/CVE-2018-14042.yml

The six for rubygems are actually already in this repo over at https://github.com/rubysec/ruby-advisory-db/tree/master/libraries/rubygems

so it is syncing over 13 CVEs, but only 7 are legit additions I think.

If we look at at all time, the stats work out this:

  8 2011
  11 2012
  15 2013
  11 2014
   1 2015
   1 2017
   5 2018
   8 2019

All that is simply CVEs, which is all that is currently published by GitHub. The real gain is the future...

Copy link
Contributor Author

@rschultheis rschultheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @reedloden thanks for maintaining RubySec.

So full disclosure I work at GitHub on the Advisory dataset (with @mveytsman and @phillmv 👋 ).

So right now the only rubygem advisories in GitHub's api are CVE based. But the new maintainer security advisories should change that soon. With that feature a gem maintainer can report an advisory to GitHub, and if it meets GitHub's requirements it gets published, and then this script will sync it down. So in this way we should get advisories which are not CVE based, but rather straight from gem maintainers. Here is an example NPM advisory which has been published in this way.

Let me know if you think any other changes should happen and if you are cool with merging this!

# There are many old CVEs in the GitHub advisory dataset that are not in here
# It is more important to sync the newer ones, so this allows the user to
# control how old of CVEs the sync should pull over
def self.sync(min_year: 2018)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2018 onward, it produces this output:

Sync completed
Wrote these files:
---
- gems/rubygems-update/CVE-2019-8325.yml
- gems/rubygems-update/CVE-2019-8320.yml
- gems/rubygems-update/CVE-2019-8324.yml
- gems/rubygems-update/CVE-2019-8321.yml
- gems/rubygems-update/CVE-2019-8322.yml
- gems/rubygems-update/CVE-2019-8323.yml
- gems/ruby-openid/CVE-2019-11027.yml
- gems/chloride/CVE-2018-6517.yml
- gems/doorkeeper-openid_connect/CVE-2019-9837.yml
- gems/fat_free_crm/CVE-2018-1000842.yml
- gems/smart_proxy_dynflow/CVE-2018-14643.yml
- gems/jekyll/CVE-2018-17567.yml
- gems/bootstrap/CVE-2018-14042.yml

The six for rubygems are actually already in this repo over at https://github.com/rubysec/ruby-advisory-db/tree/master/libraries/rubygems

so it is syncing over 13 CVEs, but only 7 are legit additions I think.

If we look at at all time, the stats work out this:

  8 2011
  11 2012
  15 2013
  11 2014
   1 2015
   1 2017
   5 2018
   8 2019

All that is simply CVEs, which is all that is currently published by GitHub. The real gain is the future...

end

def external_reference
github_advisory_graphql_object["references"].first["url"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the GitHub API currently only provides a single reference unfortunately. Its a big limitation in this case for sure. For CVEs the GitHub primary reference is the nvd link. When GitHub ultimately does publish a maintainer security advisory for a rubygem, the link though will be to the GitHub advisory(NPM example).

We could hand curate in some better references for sure. Option B would be to grab the CVE data and pull the reference list from that. It seems worth updating the docs at least to mention updating the references to include best/most important references not just what was in the GitHub API.

"cve" => cve_id[4..20],
"date" => published_day,
"url" => external_reference,
"title" => github_advisory_graphql_object["summary"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, another limitation of how the GitHub api currently works. The summary field is a bit aspirational in the GitHub Advisory object right now, and it is literally that template you define. For CVEs there is no "summary" field in CVEs, just description, so only way to do this is to write a better one. Or take the first sentence of the CVE description perhaps.

# and therefore should not be committed as is
# We can not directly translate from GitHub to rubysec advisory format
#
# The patched_versions field is not exactly available.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it is definitely possible to make some logic to automate the most common cases of translation.

Some common cases:

      {
          "package": {
            "name": "chloride"
          },
          "vulnerableVersionRange": "< 0.3.0",
          "firstPatchedVersion": {
            "identifier": "0.3.0"
          }
        },
        {
          "package": {
            "name": "actionview"
          },
          "vulnerableVersionRange": ">= 4.0.0, < 4.2.11.1",
          "firstPatchedVersion": {
            "identifier": "4.2.11.1"
          }
        },
        {
          "package": {
            "name": "actionview"
          },
          "vulnerableVersionRange": ">= 5.2.0, < 5.2.2.1",
          "firstPatchedVersion": {
            "identifier": "5.2.2.1"
          }
        },
        {
          "package": {
            "name": "actionview"
          },
          "vulnerableVersionRange": ">= 4.0.0, < 4.2.11.1",
          "firstPatchedVersion": {
            "identifier": "4.2.11.1"
          }
        },
        {
          "package": {
            "name": "actionview"
          },
          "vulnerableVersionRange": ">= 5.2.0, < 5.2.2.1",
          "firstPatchedVersion": {
            "identifier": "5.2.2.1"
          }
        },

I think my preference would be to follow this logic up with another PR. We might need to add some tests for the smart logic like that and this isn't really structured right now for that

@reedloden reedloden merged commit 6744361 into rubysec:master Jun 28, 2019
@reedloden
Copy link
Member

Sounds good. Just landed this. Let's iterate and figure out next steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants