Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Extend/enhance NEP-29 to cover security fix back-ports. #21713

Open
stuartarchibald opened this issue Jun 10, 2022 · 9 comments
Open

ENH: Extend/enhance NEP-29 to cover security fix back-ports. #21713

stuartarchibald opened this issue Jun 10, 2022 · 9 comments

Comments

@stuartarchibald
Copy link
Contributor

Proposed new feature or change:

This is about Numba's interaction with NumPy as part of its software dependency chain.

  1. In approximately the last six months a number of CVEs have been issued against NumPy (this issue is invariant of the contents of the CVEs!).
  2. Anecdotally, based on issues reported by Numba users, ensuring software dependency chains are free of some level of CVE is becoming more common practice. Should software not have a compliant dependency chain, it is banned from use.
  3. Fixes for CVEs are made against the "latest" NumPy release (not sure how accurate this is, there's discussion later).

The effect of the above is that if a CVE is issued against some version of NumPy, downstream packages, like Numba:

  • that are specified to depend on the NumPy version(s) with a CVE present

and

  • for which there is no upgrade path available to a NumPy (probably the "latest") with the security fix

are also then considered "bad" and their use is prohibited (or similar) by dependency chain validation applications.

Admittedly the specific situation with Numba is perhaps a bit unusual as a lot of packages depending on NumPy have no upper restriction on the NumPy version used. As a result, for these packages, there's typically an upgrade path available to the latest NumPy version which most likely contains the security fix.

However, Numba specifies compatibility against the versions of NumPy it has dedicated support for and that it specifically tests, essentially it has a bounded NumPy version support limit. This is because Numba takes replicating NumPy seriously and involved testing is a large part of that. It is also very rare for Numba to need a trivial "bump to the next NumPy version" style patch to accommodate a new NumPy release, updating to a new version almost always involves changing algorithms or updating APIs. Whilst not directly the same, it's possible to envisage that there are other projects/entities with software stacks that face similar issues where by e.g. every declared supported version of NumPy needs testing before acceptance to production, for example a HPC centre.

Practically, there have been two cases in the last six months where Numba has had to do out-of-typical-schedule changes/releases to support the newer versions of NumPy to accommodate the situation described above.

Case 1:

Numba 0.55.0 was shipped prior to RC testing being complete and with some minor known issues in the code base so as to offer an upgrade path to NumPy 1.21 (this was against CVE-2021-33430).
https://github.com/numba/numba/blob/65cbe1cab01cc186ce873a65d94b5096acbf07f3/CHANGE_LOG#L39-L55

Case 2:

A Numba 0.55.2 had to be created out of cycle to offer an upgrade path to NumPy 1.22 due to the CVEs noted in numba/numba#8025, it took a couple of weeks to create the support patch and then back-port and test it.

The question...

NumPy back-ports fixes for regressions and has the very useful NEP-29 in relation to support schedules. Does NumPy have or plan to have a similar back-port policy/schedule for security fixes that address CVEs?

The benefits of having such a policy that guaranteed extending security fixes back a couple of releases, or to a dedicated long term support release, would be that it would be a lot easier for downstream projects to ensure their deliberately constrained reliance on specific NumPy versions isn't inadvertently triggering software supply/dependency chain security problems.

Many thanks in advance for considering this!

CC @rgommers @seibert

@rgommers
Copy link
Member

Thanks for the detailed explanation of your use case @stuartarchibald.

NumPy back-ports fixes for regressions and has the very useful NEP-29 in relation to support schedules. Does NumPy have or plan to have a similar back-port policy/schedule for security fixes that address CVEs?

We have no published policy, yet. I agree we should have one.

One thing I'd like to point out is that we ( me) do maintain a policy somewhere, namely within the Tidelift platform. Tidelift sponsors NumPy with a monthly amount in the range of $1,000 - $3,000, and one of the key things they ask in return is that we:

  1. Have a security vulnerability reporting mechanism - which is a good idea anyway of course, and we have this in our README: **Report a security vulnerability:** https://tidelift.com/docs/security. They then come in to us privately, and at least @seberg and me see those reports.
  2. Be proactive in responding to or, if needed, implementing fixes and disclosing issues responsibly.
  3. Assess CVEs filed against NumPy for severity in their platform
  4. Mark the "supported releases" in the Tidelift platform to which we backport fixes. The way I've set it up there is the last two releases.

It has been a while since we've had a valid report, much of the CVEs are bogus - which often means that there is no "fix" to backport. There often is discussion in an issue though. This in turn means that every vendor who wants to curate CVEs and release info on where those have been fixed has some manual validation & book-keeping to do. We could be better in advising on issues specifically about that.

@rgommers
Copy link
Member

Also worth mentioning that at least once someone went through the trouble of disputing a CVE (see #18993 (comment)), and that that was at least successful in downgrading the severity and marking it as disputed.

4. Mark the "supported releases" in the Tidelift platform to which we backport fixes. The way I've set it up there is the last two releases.

It's possible that this hasn't happened in all cases, e.g. #18993 (comment) says that the fix was to remove deprecated code there - and backporting that too far would break our backwards compatibility policy (which was not warranted, since we anyway didn't agree with this being a real vulnerability).

@charris
Copy link
Member

charris commented Jun 10, 2022

I try to backport CVE fixes, but that only goes for the latest released NumPy version. Occasionally I will go back one release further if it helps. One problem in the last year has been the rapid change in our build environment and NumPy itself, which makes backports and releasing older versions difficult. Everything bitrots quite rapidly these days. We will see if things settle down a bit over the next two years.

@seberg
Copy link
Member

seberg commented Jun 10, 2022

We can try to make sure fixed versions get updated better, I am not sure whether they are or not. When it comes to disputing, I would really like someone with more security background to give a hand/explain it...
Disputing a CVE is easy, but getting it removed so no-one downstream has to be bothered by it seems rather tricky...

One thing that maybe we could do, is to write a "policy" (not full blown). Not that I expect it helps much in practice, but many "CVE"s were put up for NumPy functions, if we note down somewhere that:

  • Access to calling arbitrary NumPy functions is clearly considered privileged at the process level. Errors here are currently not considered critical, they are usually bugs.
  • Working unknown data should be safe, including loading data (with certain clear exceptions, such as pickled data). We consider such errors to be critical.
    • E.g. very large files can lead to crashes of course (during loading or later), so it is the responsibility of the party loading to sanity check e.g. file-sizes if they want to avoid DoS
    • E.g. for np.load(), checking the result dtype is also highly recommended (see below)

Then maybe we have an easier path of doing an actually successful dispute in the future...

The question is what "data" is. Datatype strings for example, i.e. np.dtype(string) should not lead to problems. But if you accept arbitrary strings, you probably must sanity check the resulting dtype. In general, I would recommend to only allow simple or pre-checked, limited, structured dtypes for example.

The reason is two-fold: First, less use means a higher probability of NumPy issues in structured dtypes. Second, structured dtypes open up a vast amount of complexity that can easily lead to DoS (e.g. by memory exhaustion) without any NumPy issue involved at all.

Other DoS issues can occur of course (e.g. arrays filled with NaNs or weirdly shaped might just be surprisingly slow).

@rgommers
Copy link
Member

One problem in the last year has been the rapid change in our build environment and NumPy itself, which makes backports and releasing older versions difficult. Everything bitrots quite rapidly these days.

True. Perhaps cibuildwheel will make things better here. If not, we should probably say "always one release" or "only two releases in case of vulnerabilities that the NumPy team considers critical". For the latter, I don't think we have had many of those. Maybe privilege escalation in f2py (?) once (also in scipy.weave) a long time ago.

When it comes to disputing, I would really like someone with more security background to give a hand/explain it...
Disputing a CVE is easy, but getting it removed so no-one downstream has to be bothered by it seems rather tricky...

Yes, that makes sense. Maybe we can find a volunteer for this. Or otherwise if at least we write up our own summary in the clearest/easiest to digest fashion, folks who are responsible for dealing with security issues in the stacks they deploy will be able to do something with that more easily.

One thing that maybe we could do, is to write a "policy" (not full blown).

I like that idea. It should help to some extent I think. It's also good to point out that exactly zero of the existing CVEs were disclosed to us in a responsible manner, and many suffer from a common mistake that the first bullet point of your policy already addresses.

We can make things even clearer with an example like:

# If you can call NumPy APIs, you are highly likely to be allowed to execute arbitrary Python code as a user
# NumPy does NOT have a denial-of-service vulnerability if instead of invoking it you can simply run:
while True:
    do_something_expensive()

The policy could also maintain a list of known CVEs perhaps.

@stuartarchibald
Copy link
Contributor Author

@rgommers

Thanks for the detailed explanation of your use case @stuartarchibald.

NumPy back-ports fixes for regressions and has the very useful NEP-29 in relation to support schedules. Does NumPy have or plan to have a similar back-port policy/schedule for security fixes that address CVEs?

We have no published policy, yet. I agree we should have one.

Many thanks for confirming.

One thing I'd like to point out is that we ( me) do maintain a policy somewhere, namely within the Tidelift platform. Tidelift sponsors NumPy with a monthly amount in the range of $1,000 - $3,000, and one of the key things they ask in return is that we:

1. Have a security vulnerability reporting mechanism - which is a good idea anyway of course, and we have this in our README: `**Report a security vulnerability:** https://tidelift.com/docs/security`. They then come in to us privately, and at least @seberg and me see those reports.

2. Be proactive in responding to or, if needed, implementing fixes and disclosing issues responsibly.

3. Assess CVEs filed against NumPy for severity in their platform

4. Mark the "supported releases" in the Tidelift platform to which we backport fixes. The way I've set it up there is the _last two releases_.

Thanks for explaining this, I was aware that security reports pointed to Tidelift but didn't realise that there was a requirement/policy in place in relation to this.

It has been a while since we've had a valid report, much of the CVEs are bogus - which often means that there is no "fix" to backport. There often is discussion in an issue though. This in turn means that every vendor who wants to curate CVEs and release info on where those have been fixed has some manual validation & book-keeping to do. We could be better in advising on issues specifically about that.

I think a common place for holding information like this would certainly be helpful for e.g. Numba as it would be quick and easy to check what was patched and when. Perhaps a time frame and place for publishing what was fixed/backported could be defined in the potential NEP-29 extension?

@stuartarchibald
Copy link
Contributor Author

I try to backport CVE fixes, but that only goes for the latest released NumPy version. Occasionally I will go back one release further if it helps. One problem in the last year has been the rapid change in our build environment and NumPy itself, which makes backports and releasing older versions difficult. Everything bitrots quite rapidly these days. We will see if things settle down a bit over the next two years.

@charris Many thanks for doing the CVE backports (and backports in general)! Numba support for NumPy on average seems to be about one release behind NumPy main, depending on the phase of the projects' respective release cycles. Most of the time, a backport across a single release would be sufficient, the issue is when the cycles are most out of phase and then I think two releases might be needed. There's a discussion in numba/numba#8008, part of which is about Numba better tracking NumPy upstream, so I'm hopeful that doing so will help the situation.

RE: build environments, Numba has similar issues in that there's large combinations of platforms/OS etc and backports to previous releases often require very detailed capturing of the state of a system when the packages were built, it's a challenging problem.

@stuartarchibald
Copy link
Contributor Author

@seberg @rgommers

The policy could also maintain a list of known CVEs perhaps.

A policy that contained this sort of information against NumPy versions would be really helpful for quick reference. Thanks!

@seberg
Copy link
Member

seberg commented Jul 5, 2022

I opened gh-21927 to add some docs on "security stuff". Not what is requested here probably, but thought the crowd here could have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants