New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hypothesis.extra.numpy
only generates strings of length at most one
#2229
Comments
It's also worth looking out for trouble with python2 versus python3 here. |
True! I've only tested this on Python 3. Though given that we're in the dying days of Python 2 support if it presents much trouble we may just want to wait on fixing this until January... |
Related to #2085... I'd probably just deprecate all usage of unsized string dtypes, have from_dtype treat unsized as size one, and be done with it. Not sure how that's interacting with DO_NOT_ESCALATE though. |
Alternatively we could add special handling for string arrays, to fill them differently, but I'd rather not. |
It wouldn't be super hard to do. We could generate string arrays as object arrays, then convert to the right dtype at the end of generation. |
For reasons I have not fully determined, if you run the following:
You get the following error:
The confusion is not that this code fails with
HYPOTHESIS_DO_NOT_ESCALATE
set but that it doesn't without it set, because our code for this is all wrong.The reason for this is that 'U' is something of a lie of a dtype. Consider the following code:
The 'U' dtype is actually a family of dtypes each of bounded width. When you create an array of unicode objects there's an implicit fixed sized limit on every element. As we create our arrays using
np.zeros
, this results in all unicode we generate being implicitly truncaed to elements of size one.The same issue presumably exists with byte strings.
You can see this more directly by the fact that the following test passes but emits a pile of deprecation warnings:
The text was updated successfully, but these errors were encountered: