Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add binary version parsing for nodejs, php and python #6524

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

kovacs-levent
Copy link

@kovacs-levent kovacs-levent commented Apr 19, 2024

Description

Implement binary version parsing for at least key binaries. The feature should be extended upon, but first round, I'm going to be focusing stuff which matters for my use-case (PHP, NodeJS, Python)... I'm keeping Java binary parsing since it's already been implemented by laurentdelosieresmano in the related PR👍 Since go-dep-parser was moved to this Repo, I'm reopening the PR and adding Python dep parsing.

Related issues

Related PRs

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Levente Kovacs seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@kovacs-levent
Copy link
Author

I signed the cla... Not sure why it's not triggering as signed. On my forked branch, every test passed, so if you trigger the tests it should be good.

@kovacs-levent
Copy link
Author

kovacs-levent commented Apr 19, 2024

Hmm, while all tests are passing, I do not see the new binary results when I build the trivy from source and run the same experiment as in #6457, I'm not sure why that is the case.

EDIT: Guess I realized that I not only need to implement the parsing, but the analyzer logic as part of fanal. I'll try that.

@kovacs-levent
Copy link
Author

kovacs-levent commented Apr 20, 2024

I have spent some time with this. Now python version is detected correctly by trivy and added into the sbom in a way that xeol detects it properly according to my use case.

I looked into implementing vulnerability scanning as well for standalone binaries, but I think at first only having it part of sboms is a step forward.

There are a few problems regarding implementing this with standalone binaries:

  1. It's not an OS package, hence the whole PR. But that also means I can't implement the vuln scanning as part of the ospkg detector library. When I tried that, trivy started to correctly detect the vulns in the binaries, but it also discarded everything else due to conflicting OS versions (if I implement it as ospkg detector driver, then I just lost all dpkg vulns for example).
  2. It's also not really a langpkg. but this could be solved with some extra code as part of the driver logic and without messing more with the core. However, none of the ecosystems in trivy-db are really fit for querying vulnerabilities in these standalone binaries. The best bet would be NVD I thought.
  3. BUT NVD does not have an interface to query it in trivy-db. This means that I can't query NVD's information about the standalone binary, neither OS package sources.
  4. A Get function could be implemented for NVD, but I suspect that the best way to query it is through CPE2.3 vectors. For standalone-binaries, this is easy to generate correctly I think, but afaik, trivy does not support CPE for complexity reasons (it's also not in the SBOMs).

The only way I see to implement this nicely is by adding CPE support and also putting together a Get function for the NVD vulnsrc linked in 3rd point. Other way: just hack together an NVD get interface since it'd be used only for the scanner as a fallback in case of generic binary is detected, not for regular vulnerability scanning.

Now that I know how to implement these things, I think I can put together the remaining parts (Java, PHP, NodeJS binary as part of package detection and SBOMs) in the upcoming days and the PR would be ready.

@kovacs-levent kovacs-levent marked this pull request as draft April 20, 2024 08:37
@kovacs-levent kovacs-levent changed the title feat: Add binary version parsing for java, nodejs, php and python feat: Add binary version parsing for nodejs, php and python Apr 20, 2024
@kovacs-levent
Copy link
Author

I have added support for PHP and Nodejs standalone binaries and it works like a charm with the sboms. I'm dropping support for Java because I found that it's harder to correctly determine the package type (JRE vs. JDK), and I won't go down the rabbithole since it's not part of my usecase.

Only issue I discovered yesterday is that this will break lots of integration tests which I'll have to adjust those accordingly. After I'll adjust integration tests, I think this PR can be merged.

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @kovacs-levent
Thanks for this PR.

I left some comments. Some these comments same for nodejs, php and python.

Also parsers are very similar.
I think we can merge them to 1 package. Something like this:
parser:

├── executable
│   ├── exe.go
│   ├── nodejs
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata
│   ├── python
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata
│   └── php
│       ├── parse.go
│       ├── parse_test.go
│       └── testdata

about analyzers:
looks like we can add new logic here - https://github.com/aquasecurity/trivy/blob/main/pkg/fanal/analyzer/executable/executable.go

If a php/python/npm binary is found, use its parser and add the library.

wdyt?

@@ -124,6 +124,7 @@ require (
github.com/alecthomas/chroma v0.10.0
github.com/antchfx/htmlquery v1.3.0
github.com/apparentlymart/go-cidr v1.1.0
github.com/aquasecurity/go-dep-parser v0.0.0-20240213093706-423cd04548a5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil, err
}

if bytes.HasPrefix(data, []byte("\x7FELF")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about windows and macos formats?

}

// Python's version pattern is [NUL]3.11.2[NUL]
re := regexp.MustCompile(`^\d{1,4}\.\d{1,4}\.\d{1,4}[-._a-zA-Z0-9]*$`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment with a link where you get this regex?


var libs []types.Library
libs = append(libs, types.Library{
ID: packageID(name, vers),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use dependent.ID once. So you can just use it here.

Suggested change
ID: packageID(name, vers),
ID: dependency.ID(ftypes.PythonGeneric, name, version),

}
}

return "python", vers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to determine package name?
Looks like it might be confusing if all detected binaries are named "python"

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
a := pythonBinaryAnalyzer{}
fileInfo, err := os.Lstat(tt.filePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can use 1 file and change names, permissions, etc. in test.
Then we can reduce the number of copies of test files.

@@ -65,7 +65,6 @@ func (s *scanner) Scan(target types.ScanTarget, _ types.ScanOptions) (types.Resu
}

logger := log.WithPrefix(string(app.Type))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need this change.

@@ -0,0 +1,82 @@
// Ported from https://github.com/golang/go/blob/e9c96835971044aa4ace37c7787de231bbde05d9/src/cmd/go/internal/version/exe.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Go 1.22 doesn't have this file. Can you please update the link and double check - we may need to add some changes.

Comment on lines +41 to +42
pythonLibNameRegex := regexp.MustCompile("^libpython[0-9]+(?:[.0-9]+)+[a-z]?[.]so.*$")
pythonBinaryNameRegex := regexp.MustCompile("(?:.*/|^)python(?P<version>[0-9]+(?:[.0-9]+)+)?$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be great if you add some comment about these regex (e.g. link to docs)

TypePip Type = "pip"
TypePipenv Type = "pipenv"
TypePoetry Type = "poetry"
TypePythonGeneric Type = "python"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we can use executable for new binaries

@kovacs-levent
Copy link
Author

Thanks for the feedback, I will make the changes when I have the time, this week’s been busy, but I’ll try to make progress when I have the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants