Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy arrays() strategy does not respect (unsigned) dtype (anymore) #3900

Closed
hojo0590 opened this issue Feb 28, 2024 · 1 comment · Fixed by #3910
Closed

numpy arrays() strategy does not respect (unsigned) dtype (anymore) #3900

hojo0590 opened this issue Feb 28, 2024 · 1 comment · Fixed by #3910
Labels
bug something is clearly wrong here performance go faster! use less memory!

Comments

@hojo0590
Copy link

hojo0590 commented Feb 28, 2024

Since version 6.98.12 (and #3795/#3895) the numpy arrays() strategy seems to not respect the type of its dtype parameter anymore.

The documentation states for parameter elements

elements is a strategy for generating values to put in the array. If it is None a suitable value will be inferred based on the dtype, which may give any legal value (including eg NaN for floats).

This seems to have stopped working in the recent releases for at least dtype np.uint32, so test data generation produces a lot of invalid arguments and often triggers the health check. Once you pass a strategy for valid values (e.g. st.integers(min_value=0, max_value=2**32 - 1)) test generation does work again.

To reproduce run the following as python file with pytest -s (you need numpy and pytest and a recent hypothesis installed). It will either let the first test fail (healthcheck triggers because of invalid samples) or be really slow.

import numpy as np
from hypothesis import given
from hypothesis import strategies as st
from hypothesis.extra.numpy import arrays

failing_count = 0
failing_runs = 0
working_count = 0
working_runs = 0


@st.composite
def failing_vector_of_even_length(draw, min_len=0, max_len=10000):
    global failing_runs
    failing_runs += 1

    even_1d_shape = draw(
        st.integers(min_value=min_len, max_value=max_len).filter(lambda x: x % 2 == 0)
    )
    array = draw(
        arrays(
            dtype=np.uint32,
            shape=even_1d_shape,
        )
    )
    global failing_count

    if array is not None:
        failing_count += 1
    return array


@st.composite
def vector_of_even_length(draw, min_len=0, max_len=10000):
    global working_runs
    working_runs += 1

    even_1d_shape = draw(
        st.integers(min_value=min_len, max_value=max_len).filter(lambda x: x % 2 == 0)
    )
    array = draw(
        arrays(
            dtype=np.uint32,
            shape=even_1d_shape,
            elements=st.integers(min_value=0, max_value=2**32 - 1),
        )
    )
    global working_count

    if array is not None:
        working_count += 1
    return array


@given(data=failing_vector_of_even_length())
def test_showcase(data):
    pass


@given(data=vector_of_even_length())
def test_showcase2(data):
    pass


def test_result():
    print(
        f"no elements: {failing_count} out of {failing_runs} worked; with elements: {working_count} out of {working_runs} worked."
    )
@Zac-HD Zac-HD added the bug something is clearly wrong here label Feb 28, 2024
@Zac-HD
Copy link
Member

Zac-HD commented Feb 28, 2024

Thanks for the report - this is clearly a pretty serious bug, and shipping it implies that we also need to improve our tests for dtype inference. I'll likely get to this over the weekend.

@Zac-HD Zac-HD added the performance go faster! use less memory! label Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is clearly wrong here performance go faster! use less memory!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants