Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow caused by clumsy array coercion #17785

Closed
eric-wieser opened this issue Nov 16, 2020 · 12 comments
Closed

Stack overflow caused by clumsy array coercion #17785

eric-wieser opened this issue Nov 16, 2020 · 12 comments
Labels

Comments

@eric-wieser
Copy link
Member

eric-wieser commented Nov 16, 2020

Reproducing code example:

Over at pygae/clifford#376, we've found that the adjustments to array coercion in 1.19.x seem to produce stack overflows where previously they were safe. A minimal repro is this clumsy coercion code, which mirrors what clifford.MVArray did in their 1.3.0 release.

import numpy as np

class MyArray(np.ndarray):
    def __new__(cls, input_array):
        obj = np.empty(len(input_array), dtype=object)
        obj[:] = input_array
        obj = obj.view(cls)
        return obj

class MyNastyClass:
    def __len__(self): return 32
    def __getitem__(self, item): return 1
    def __array__(self): return MyArray([self])

np.asarray(MyNastyClass())

Error message:

Windows fatal exception: stack overflow

Current thread 0x00006160 (most recent call first):
  File "<ipython-input-4-ac7cb36f8e78>", line 3 in __new__
  File "<ipython-input-4-ac7cb36f8e78>", line 12 in __array__
  File "<ipython-input-4-ac7cb36f8e78>", line 4 in __new__
  File "<ipython-input-4-ac7cb36f8e78>", line 12 in __array__
  File "<ipython-input-4-ac7cb36f8e78>", line 4 in __new__
  File "<ipython-input-4-ac7cb36f8e78>", line 12 in __array__

NumPy/Python version information:

1.19.4 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)]

Thankfully the code above was removed since it was bad for other reasons, but its unfortunate that users on an old release get a segfault.

@eric-wieser
Copy link
Member Author

If I run sys.setrecursionlimit(100) before running the above code, numpy just hangs instead of segfaulting.

@charris charris added this to the 1.19.5 release milestone Nov 16, 2020
@seberg
Copy link
Member

seberg commented Nov 16, 2020

For what its worth, on numpy master it gives a RecursionError (you have to accept dtype=object in like, that seems to be a requirement that was not strict for object before. Maybe open another issue if that seems wrong.

I think on 1.19.x the same recursion error should be happening, but as per comment add in gh-14745 (which did not change the behaviour). We ignore errors during dimension discovery, which ends up using 32 here, which then apparently ends up as a bomb later? If I set __len__ to return 1, I get the recursion error as expected.

This all seems strange, and I still didn't quite follow why it means you get a stack overflow (I was only able to reproduce memory eating/hanging and a proper RecursionError for now).

@eric-wieser
Copy link
Member Author

I still didn't quite follow why it means you get a stack overflow (I was only able to reproduce memory eating/hanging and a proper RecursionError for now).

To be clear - on 1.19.4 with my code unmodified, do you get a recursion error or a stack overflow? My result is on windows, and the issue I link to was OSX

@seberg
Copy link
Member

seberg commented Nov 16, 2020

1.19.4 with your code unmodified I get hanging behaviour. Not sure it is something that I did, but my recursion limit is 1000 by default. There may be a sweet-spot for the recursion limit where it gets the stack overflow? Or the stack-overflow issue for some reason doesn't kick in on my system.

I have a transition from RecursionError to hanging at sys.setrecursionlimit(85) :). Maybe the actual difference is that I am running python 3.8.6?

@seberg
Copy link
Member

seberg commented Nov 16, 2020

@eric-wieser OK, I think I will try making the dimensions discovery use the logic:

try:
    arr = obj.__arra__()
except RuntimeError:
    raise
except:
    pass

instead and hooe that fixes it. Probably CI will be able to reproduce the issue, so we should just add this as a test to master as well and see if it indeed raises an error correctly on all systems now.

@eric-wieser
Copy link
Member Author

Any reason for using the more general RuntimeError instead of RecursionError? I have no strong feeling either way.

@seberg
Copy link
Member

seberg commented Nov 16, 2020

No, just thought that RuntimeError's are generally bad enough that we should abort. But this is for the 1.19.x branch, so should probably just do the minimal thing. Also, if we go there catching e.g. MemoryError would also make sense (which is not a subclass of RuntimeError).

See gh-17786 for a hopeful fix. We should forward-port the test before closing this.

@seberg
Copy link
Member

seberg commented Nov 21, 2020

So, my guess is that this is similar to:

class A():
    def __new__(cls, arg):
        try:
            return A([arg])
        except:
            return A(arg)

A(None)

which on my system also kills python. Albeit with fatal recursion error in Python itself and not a stack overflow on the OS level as happens in the NumPy issue.

You can expand the above code to: NVM was the same thing

If someone can explain my why this goes well beyond the original recursion limit (by 50), I might investigate more...

Until then... I will change the PR against master to allow success (which is probably fine anyway), and can modify the other PR to not include the test and maybe see what is really necessary to catch most of it, but I doubt its easy to get it watertight without some deeper understanding of why it explodes as badly as it does.

seberg added a commit to seberg/numpy that referenced this issue Nov 21, 2020
This tests that the bug reported in numpygh-17785 is already fixed on
master.
seberg added a commit to seberg/numpy that referenced this issue Nov 24, 2020
This changes it so that we only ignore attribute errors on
looking up `__array__` and propagate errors when checking
for sequences `len(obj)` if those errors are either
RecursionError or MemoryError (we consider them unrecoverable).

Also adds test for bad recursive array-like with sequence
as reported in numpygh-17785. The test might be flaky/more complicated
in which case it should probably just be deleted.
seberg added a commit to seberg/numpy that referenced this issue Dec 17, 2020
This mitigates numpygh-17785 on the 1.19.x branch. Note that the actual
problem seems to have more layers, so that the bad code will still
cause crashes on certain systems or call contexts (I am not sure).

Pre 1.19.x did not check for `__array__` in the dimension discovery
code, so was unaffected by the issue in numpygh-17785.
charris pushed a commit to charris/numpy that referenced this issue Dec 18, 2020
This changes it so that we only ignore attribute errors on
looking up `__array__` and propagate errors when checking
for sequences `len(obj)` if those errors are either
RecursionError or MemoryError (we consider them unrecoverable).

Also adds test for bad recursive array-like with sequence
as reported in numpygh-17785. The test might be flaky/more complicated
in which case it should probably just be deleted.
@h-vetinari
Copy link
Contributor

Since this issues is one of two still tagged for 1.19.5, should #17817 be backported to the 1.19.x branch as well?

@charris
Copy link
Member

charris commented Dec 21, 2020

@h-vetinari There is #17786 for 1.19.5.

@charris charris removed this from the 1.19.5 release milestone Dec 21, 2020
@charris
Copy link
Member

charris commented Dec 21, 2020

Closing, has been fixed in 1.19, 1.20, and master.

@charris charris closed this as completed Dec 21, 2020
@seberg
Copy link
Member

seberg commented Dec 21, 2020

There may still be some things slipping through. I hope those remaining are "only" fatal Python errors. But I guess we have as much as is likely to go into 1.19.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants