Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Raise recursion error during dimension discovery #17786

Merged
merged 1 commit into from Dec 18, 2020

Conversation

seberg
Copy link
Member

@seberg seberg commented Nov 16, 2020

This closes gh-17785 on the 1.19.x branch, the test should be
(slightly modified) also be included in the 1.20 branch, which
already should be raising the error correctly.

Pre 1.19.x did not check for __array__ in the dimension discovery
code, so was unaffected.


The PR is against 1.19.x branch directly (the ctors.c code does not exist anymore).

@seberg
Copy link
Member Author

seberg commented Nov 16, 2020

Well, this is confusing. The test doesn't fail in the Python 3.6 (windows) tests. May have to dig out an older Python version :/.

@charris
Copy link
Member

charris commented Nov 18, 2020

This isn't working.

@seberg
Copy link
Member Author

seberg commented Nov 18, 2020

Yeah, I wasn't yet able to figure out why python 3.6 (or whatever it is) is behaving differently, it is pretty confusing. Should have marked as draft.

@seberg seberg marked this pull request as draft November 18, 2020 16:46
@seberg
Copy link
Member Author

seberg commented Nov 19, 2020

@eric-wieser just in case you have a thought on this. By now I think this whole thing may be complicated by a Python bug somewhere. On a linux 3.6, I get no failure at all even with changes that I think should ensure all but attribute errors being raised.
However, if I add prints to __new__ or __array__, I do seem to run into a RecursionError reliably (so it definitely should be recursive, but for some reason, I don't see anything about it?).

I tried adding a Py_EnterRecursiveCall() before calling obj.__array__ to no avail. Printing out a lot of things, it does recursive calls at least during dimension discovery (always finding the list of length one and entering it, then finding the array like and doing a dimension discovery again). Until suddenly all of those do return an object array including self successfully.

Possibly the recursion triggers a MemoryError as well? In principle if dimension discovery succeeds finishes successfully for whatever reason, the __new__ can conceivably succeed.

After poking this for almost a few hours now, I must have been flipping the wrong stones (unless there is a Python bug involved), though. And whether to do with this or not, the machine (which is at BIDS) that I used to try things out on Python 3.6 is not accessible anymore.

@seberg
Copy link
Member Author

seberg commented Nov 19, 2020

Well, now at least the test suit reproduces the stack overflow that Eric is seeing :(.

@seberg
Copy link
Member Author

seberg commented Nov 19, 2020

OK, so the issue is pretty definitely that we are ignoring recursion errors, no idea if the new code has the same problems (it does a similiar thing for the sequence check (but the order is different, so you likely need to try harder if it is problematic).

But... there are more PyErr_clear(), even if that part of the code probably had particularly many, so assuming tests succeed now, I am not sure its watertight.

@seberg
Copy link
Member Author

seberg commented Nov 21, 2020

Ok, as said on the issue. I can clean this up and remove the test if you want. I would need convincing to dig deeper into trying to actually fix it. Maybe its just another place where exceptions get caught that needs to also check for recursion error. But, it seems fine on master (the code on master is much less likely to misbehave, since it will never try to call __array__ on an object after deciding that it wasn't an array-like before – which should always happen if the RecursionError is cleared).

# (for reasons unclear to me @seberg) on all Python versions/
# However, it should definitely not crash.
np.array(MyNastyClass())
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except:
except Exception:

so as not to catch KeyboardInterrupt

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not this fixes my stackoverflow, this seems like an obviously sensible improvement.

if (PyErr_ExceptionMatches(PyExc_RecursionError)) {
return -1;
}
PyErr_Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using PyErr_WriteUnraisable(NULL) in place here (and everywhere) might help smoke out the stack overflow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did that in most places, but not all. Somehow the places I considered irrelevant (because execution should hit another later) are relevant (or I missed one). I guess creating a nested array with a depth related to the recursion limit, is somehow just too much... Failing hard would likely solve it (i.e. in my python "crash" example, you still need the try/except RecursionError:)

I will first give it one more shot on master later (which is simpler), making the equivalent paths there more strict.

@mattip
Copy link
Member

mattip commented Nov 21, 2020

This is still marked as "draft", @eric-wieser are you sure you meant to approve it?

@eric-wieser
Copy link
Member

I think the change is a good idea even if we end up ripping out the test that still segfaults. When I approved I hadn't noticed the segfault though.

@seberg
Copy link
Member Author

seberg commented Nov 24, 2020

OK, I updated gh-17817 (the master version) for now to remove the tricky test. I would do the same update here then (with the exception of not raising all errors, but always checking only for MemoryError and RecursionError. Just to be a bit pedantic about trying to be minimal in a bug-fix release. (That is if we don't want to stick to master to begin with.)

@charris
Copy link
Member

charris commented Dec 17, 2020

I put a 1.19.5 milestone on this, but it still needs finishing.

This mitigates numpygh-17785 on the 1.19.x branch. Note that the actual
problem seems to have more layers, so that the bad code will still
cause crashes on certain systems or call contexts (I am not sure).

Pre 1.19.x did not check for `__array__` in the dimension discovery
code, so was unaffected by the issue in numpygh-17785.
@seberg seberg marked this pull request as ready for review December 17, 2020 22:27
@seberg
Copy link
Member Author

seberg commented Dec 17, 2020

OK, updated and ported the tests that are in master. Unlike master, I have chosen to still clear all errors except MemoryError and RecursionError here (some deep recursions can also cause MemoryError if I saw the cpython source correctly; so it seemed right to me to consider both fatal errors.

Master is identical now, but the only place where it still clears these errors is for the equivalent code in discover_dimensions here.

@charris charris merged commit 8ab99be into numpy:maintenance/1.19.x Dec 18, 2020
@charris
Copy link
Member

charris commented Dec 18, 2020

Thanks Sebastian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants