numpy arrays() strategy does not respect (unsigned) dtype (anymore) #3900

hojo0590 · 2024-02-28T06:35:45Z

Since version 6.98.12 (and #3795/#3895) the numpy arrays() strategy seems to not respect the type of its dtype parameter anymore.

The documentation states for parameter elements

elements is a strategy for generating values to put in the array. If it is None a suitable value will be inferred based on the dtype, which may give any legal value (including eg NaN for floats).

This seems to have stopped working in the recent releases for at least dtype np.uint32, so test data generation produces a lot of invalid arguments and often triggers the health check. Once you pass a strategy for valid values (e.g. st.integers(min_value=0, max_value=2**32 - 1)) test generation does work again.

To reproduce run the following as python file with pytest -s (you need numpy and pytest and a recent hypothesis installed). It will either let the first test fail (healthcheck triggers because of invalid samples) or be really slow.

import numpy as np
from hypothesis import given
from hypothesis import strategies as st
from hypothesis.extra.numpy import arrays

failing_count = 0
failing_runs = 0
working_count = 0
working_runs = 0


@st.composite
def failing_vector_of_even_length(draw, min_len=0, max_len=10000):
    global failing_runs
    failing_runs += 1

    even_1d_shape = draw(
        st.integers(min_value=min_len, max_value=max_len).filter(lambda x: x % 2 == 0)
    )
    array = draw(
        arrays(
            dtype=np.uint32,
            shape=even_1d_shape,
        )
    )
    global failing_count

    if array is not None:
        failing_count += 1
    return array


@st.composite
def vector_of_even_length(draw, min_len=0, max_len=10000):
    global working_runs
    working_runs += 1

    even_1d_shape = draw(
        st.integers(min_value=min_len, max_value=max_len).filter(lambda x: x % 2 == 0)
    )
    array = draw(
        arrays(
            dtype=np.uint32,
            shape=even_1d_shape,
            elements=st.integers(min_value=0, max_value=2**32 - 1),
        )
    )
    global working_count

    if array is not None:
        working_count += 1
    return array


@given(data=failing_vector_of_even_length())
def test_showcase(data):
    pass


@given(data=vector_of_even_length())
def test_showcase2(data):
    pass


def test_result():
    print(
        f"no elements: {failing_count} out of {failing_runs} worked; with elements: {working_count} out of {working_runs} worked."
    )

The text was updated successfully, but these errors were encountered:

Zac-HD · 2024-02-28T20:31:59Z

Thanks for the report - this is clearly a pretty serious bug, and shipping it implies that we also need to improve our tests for dtype inference. I'll likely get to this over the weekend.

Zac-HD added the bug something is clearly wrong here label Feb 28, 2024

Zac-HD added the performance go faster! use less memory! label Mar 9, 2024

Zac-HD mentioned this issue Mar 9, 2024

Fix missing-fill problem in arrays() #3910

Merged

Zac-HD closed this as completed in #3910 Mar 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy arrays() strategy does not respect (unsigned) dtype (anymore) #3900

numpy arrays() strategy does not respect (unsigned) dtype (anymore) #3900

hojo0590 commented Feb 28, 2024 •

edited

Zac-HD commented Feb 28, 2024

numpy arrays() strategy does not respect (unsigned) dtype (anymore) #3900

numpy arrays() strategy does not respect (unsigned) dtype (anymore) #3900

Comments

hojo0590 commented Feb 28, 2024 • edited

Zac-HD commented Feb 28, 2024

hojo0590 commented Feb 28, 2024 •

edited