Problems with numpy.load and allow_pickle #85

swkeemink · 2020-03-06T17:41:00Z

Expected behavior
Whenever an experiment is defined, it should check if a previous analysis was run and if there is data to be loaded. It should then show the message 'Reloading previously prepared data...'.

Actual behaviour
Even if previous analysis data exists, defining a new experiment now always redoes all of the analysis.

Reason
Currently if numpy.load is called, if allow_pickle is not set to True it will throw an error. In our current code this means that whenever an experiment is defined, all of the analysis will be re-done instead of loaded from the previously generated files. The easy fix is to just add 'allow_pickle=True' as an option.

This seems to be a new feature, and it's there for security reasons:

Allow loading pickled object arrays stored in npy files. Reasons for disallowing pickles include security, as loading pickled data can execute arbitrary code. If pickles are disallowed, loading object arrays will fail. Default: False
Changed in version 1.16.3: Made default False in response to CVE-2019-6446.

So we perhaps should think about fixing it in another way, without somehow breaking backwards compatibility with previously stored files!

The text was updated successfully, but these errors were encountered:

swkeemink · 2020-03-07T10:22:11Z

To avoid this type of bug we should probably avoid using the try and except structure, as well as write a test for them. This also applies to #82 in general.

scottclowe · 2020-03-08T00:04:30Z

I find it very peculiar that they changed the default for np.load to be allow_pickle=False, but left the default for np.save as allow_pickle=True. This will obviously cause a lot of problems for people using the default options!

swkeemink · 2020-03-08T10:21:44Z

Indeed. And it is very problematic for backwards compatibility... (It does sound like it's a good idea not to allow_pickle, if it would be able to run arbitrary code!)

scottclowe · 2020-03-08T14:12:39Z

I think the advantage of saving the files with pickle enabled is it compresses them, which is still desirable.

scottclowe · 2020-03-08T15:03:36Z

So I'm not sure what we should do long term. I think it would be good to add an option so the user can pick, and for the moment the default behaviour should be backwards compatible. Maybe we should change the default behaviour in the future though.

swkeemink · 2020-03-16T11:28:57Z

Long term we should use HDF5 by default, with the option for numpy to preserve backward compatibility (this depends on finishing #82).

scottclowe · 2020-05-04T16:20:11Z

The bug was fixed by #111, restoring intended behaviour, but we may still need to implement a long term solution.

swkeemink · 2021-07-12T16:42:21Z

@scottclowe , we can close this now with your changes to the caching backend right?

scottclowe · 2021-07-12T17:04:49Z

I have not worked on anything which fixes the potential pickle vulnerability.

If we are happy to allow this vulnerability to persist on the basis that users are only unpickling their own cache so they are not exposing themselves to any external threats, you can close this on the basis that it was resolved by #111.

swkeemink added the bug label Mar 6, 2020

swkeemink mentioned this issue Mar 6, 2020

Add option to not save generated data #86

Closed

scottclowe mentioned this issue Apr 7, 2020

BUG: Allow loading pickled numpy cache files #111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with numpy.load and allow_pickle #85

Problems with numpy.load and allow_pickle #85

swkeemink commented Mar 6, 2020

swkeemink commented Mar 7, 2020

scottclowe commented Mar 8, 2020

swkeemink commented Mar 8, 2020

scottclowe commented Mar 8, 2020

scottclowe commented Mar 8, 2020

swkeemink commented Mar 16, 2020

scottclowe commented May 4, 2020 •

edited

swkeemink commented Jul 12, 2021

scottclowe commented Jul 12, 2021

Problems with numpy.load and allow_pickle #85

Problems with numpy.load and allow_pickle #85

Comments

swkeemink commented Mar 6, 2020

swkeemink commented Mar 7, 2020

scottclowe commented Mar 8, 2020

swkeemink commented Mar 8, 2020

scottclowe commented Mar 8, 2020

scottclowe commented Mar 8, 2020

swkeemink commented Mar 16, 2020

scottclowe commented May 4, 2020 • edited

swkeemink commented Jul 12, 2021

scottclowe commented Jul 12, 2021

scottclowe commented May 4, 2020 •

edited