Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More improvements to test_linalg #101

Merged
merged 61 commits into from Feb 26, 2024
Merged

Conversation

asmeurer
Copy link
Member

So far just vector_norm, but more are incoming.

Note that NumPy will fail this test without numpy/numpy#21084.

I also realized that the linalg tests should be making better use of the helper functions. I've done so for this test, but still need to update the other tests.

Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Note that NumPy will fail this test without numpy/numpy#21084.

Ah, we should skip this in the workflow anywho because we're pinning against NumPy releases now.

@asmeurer
Copy link
Member Author

@honno see what you think of the allclose helpers I've added here. They can likely use some improvement, but they should work for now.

Comment on lines 152 to 172
def allclose(x, y, rel_tol=0.25, abs_tol=1, return_indices=False):
"""
Return True all elements of x and y are within tolerance

If return_indices=True, returns (False, (i, j)) when the arrays are not
close, where i and j are the indices into x and y of corresponding
non-close elements.
"""
for i, j in iter_indices(x.shape, y.shape):
i, j = i.raw, j.raw
a = x[i]
b = y[j]
if not (math.isfinite(a) and math.isfinite(b)):
# TODO: If a and b are both infinite, require the same type of infinity
continue
close = math.isclose(a, b, rel_tol=rel_tol, abs_tol=abs_tol)
if not close:
if return_indices:
return (False, (i, j))
return False
return True
Copy link
Member

@honno honno Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep looks good, this is really how we have to do it if we can't use masking.

Maybe move this and assert_allclose to test_linalg.py for now, as ideally what we'd be doing is standardising these utils across function groups then, and scoping these utils for each function group makes it easier to distinguish where they're used and their subtle differences. (standardising seems quite doable, I just need to get round to it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem here is represents a pretty significant performance hit. Before this commit (with the exact equality test), the linalg tests take 15 seconds on my computer. After, they take 44 seconds. Performance isn't our top priority, but maybe we should try array operations and only fallback to this when there are nonfinite values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this and assert_allclose to test_linalg.py for now, as ideally what we'd be doing is standardising these utils across function groups then, and scoping these utils for each function group makes it easier to distinguish where they're used and their subtle differences. (standardising seems quite doable, I just need to get round to it)

Think I'd still like this for now, esp as I'm generally rethinking the pytest helpers for #200/general look at vectorisation. Not blocking tho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm just going to delete them. I'm not using them anymore, because testing float arrays from linalg functions like this turned out to be too difficult.

@asmeurer
Copy link
Member Author

It looks like it isn't as straightforward as this. CuPy has several failures:

_________________________________________________________________________________________________________ test_eigh __________________________________________________________________________________________________________

    @pytest.mark.xp_extension('linalg')
>   @given(x=symmetric_matrices(finite=True))

array_api_tests/test_linalg.py:239: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
array_api_tests/test_linalg.py:256: in test_eigh
    _test_stacks(lambda x: linalg.eigh(x).eigenvectors, x,
array_api_tests/test_linalg.py:90: in _test_stacks
    assert_equal(res_stack, decomp_res_stack)
array_api_tests/test_linalg.py:49: in assert_equal
    assert_allclose(x, y)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = Array([[-0.40824828,  0.70710677,  0.57735026],
       [-0.40824828, -0.70710677,  0.57735026],
       [ 0.81649655,  0.        ,  0.57735026]], dtype=float32)
y = Array([[ 0.8164966 ,  0.        , -0.57735026],
       [-0.4082483 , -0.70710677, -0.5773504 ],
       [-0.40824828,  0.7071068 , -0.5773503 ]], dtype=float32), rel_tol = 0.25, abs_tol = 1

    def assert_allclose(x, y, rel_tol=0.25, abs_tol=1):
        """
        Test that x and y are approximately equal to each other.
    
        Also asserts that x and y have the same shape and dtype.
        """
        assert x.shape == y.shape, f"The input arrays do not have the same shapes ({x.shape} != {y.shape})"
    
        assert x.dtype == y.dtype, f"The input arrays do not have the same dtype ({x.dtype} != {y.dtype})"
    
        c = allclose(x, y, rel_tol=rel_tol, abs_tol=abs_tol, return_indices=True)
        if c is not True:
            _, (i, j) = c
>           raise AssertionError(f"The input arrays are not close with {rel_tol = } and {abs_tol = } at indices {i = } and {j = }")
E           AssertionError: The input arrays are not close with rel_tol = 0.25 and abs_tol = 1 at indices i = (0, 0) and j = (0, 0)

array_api_tests/array_helpers.py:187: AssertionError
--------------------------------------------------------------------------------------------------------- Hypothesis ---------------------------------------------------------------------------------------------------------
Falsifying example: test_eigh(
    x=Array([[[1., 1., 1.],
            [1., 1., 1.],
            [1., 1., 1.]]], dtype=float32),
)
___

Note that the sort order of eigenvalues is unspecified in the spec.

Also svd and svdvals fail with errors like

________________________________________________________________________________________________________ test_svdvals ________________________________________________________________________________________________________

    @pytest.mark.xp_extension('linalg')
>   @given(
        x=finite_matrices(),
    )

array_api_tests/test_linalg.py:562: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
array_api_tests/test_linalg.py:577: in test_svdvals
    _test_stacks(linalg.svdvals, x, dims=1, res=res)
array_api_tests/test_linalg.py:90: in _test_stacks
    assert_equal(res_stack, decomp_res_stack)
array_api_tests/test_linalg.py:49: in assert_equal
    assert_allclose(x, y)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = Array([34741248.,        0.], dtype=float32), y = Array([3.4741252e+07, 1.3871019e+00], dtype=float32), rel_tol = 0.25, abs_tol = 1

    def assert_allclose(x, y, rel_tol=0.25, abs_tol=1):
        """
        Test that x and y are approximately equal to each other.
    
        Also asserts that x and y have the same shape and dtype.
        """
        assert x.shape == y.shape, f"The input arrays do not have the same shapes ({x.shape} != {y.shape})"
    
        assert x.dtype == y.dtype, f"The input arrays do not have the same dtype ({x.dtype} != {y.dtype})"
    
        c = allclose(x, y, rel_tol=rel_tol, abs_tol=abs_tol, return_indices=True)
        if c is not True:
            _, (i, j) = c
>           raise AssertionError(f"The input arrays are not close with {rel_tol = } and {abs_tol = } at indices {i = } and {j = }")
E           AssertionError: The input arrays are not close with rel_tol = 0.25 and abs_tol = 1 at indices i = (1,) and j = (1,)

array_api_tests/array_helpers.py:187: AssertionError

which shows that using an absolute tolerance is not going to work.

There are also some test failures which I think are due to bugs in cupy. This one I'm not sure about

>>> import cupy.array_api
>>> x1 = cupy.array_api.asarray([51307], dtype=cupy.array_api.uint16)
>>> x2 = cupy.array_api.asarray([[[327]]], dtype=cupy.array_api.uint16)
>>> cupy.array_api.linalg.matmul(x1, x2)
Array([[172]], dtype=uint16)
>>> cupy.array_api.linalg.matmul(x1, x2[0, ...])
Array([173], dtype=uint16)
>>> x1
Array([51307], dtype=uint16)
>>> import numpy as np
>>> np.asarray([51307*327], dtype=np.uint16)
array([173], dtype=uint16)

CuPy is clearly doing something weird here, but should this overflow behavior be something that we should be avoiding in the test suite? If so then I guess we need to be more careful about how we generate integer array inputs to functions like matmul.

@honno
Copy link
Member

honno commented Apr 28, 2022

Unfortunately I'm not familiar with linalg, so not super sure on expectations here. (I'll be out until 9th May, where I'll go through linalg and these tests more so I might be more useful in the future heh.)

should this overflow behavior be something that we should be avoiding in the test suite?

Ah interesting catch. Personally I think it's fine to just relatively assert integer operations here (internal dtype-representation-shenanigans like numpy/numpy#20226 maybe?), as such edge cases probably won't affect array-consuming code.

@asmeurer asmeurer mentioned this pull request Apr 28, 2022
There are different equivalent ways of representing eigenspaces in general,
and the same algorithm may choose different ways for stacked vs. non-stacked
matrices (e.g., they differ for cupy).
This still is going to require some tweaking.
Floating-point dtypes now only test shape and dtype in the stacking test.
Trying to make the allclose test work is too difficult for linear algebra
functions.
Comment on lines 49 to 57
# It's too difficult to do an approximately equal test here because
# different routines can give completely different answers, and even
# when it does work, the elementwise comparisons are too slow. So for
# floating-point dtypes only test the shape and dtypes.

# assert_allclose(x, y)

assert x.shape == y.shape, f"The input arrays do not have the same shapes ({x.shape} != {y.shape})"
assert x.dtype == y.dtype, f"The input arrays do not have the same dtype ({x.dtype} != {y.dtype})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this seems reasonable.

@asmeurer
Copy link
Member Author

asmeurer commented Feb 3, 2024

@honno I've been cleaning this up. It now depends on an upstream ndindex fix which I need to release Quansight-Labs/ndindex#155. The tests are now passing on array-api-strict (with data-apis/array-api-strict#10). I still need to test numpy, cupy, and torch.

It would help if you could look this over and see if anything looks off. I'm worried some of the code conventions here might have fallen out of date with the rest of the codebase. I've done my best to update the things I noticed, but I probably missed something.

Also note that I did update the tests here for data-apis/array-api#740, even though it isn't merged or decided on yet.

Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome LGTM, sans ndindex fixes and trivial linting issue. As far as I can tell you've utilised all the relevant helpers introduced since the PR was opened, which is grand to see!

Comment on lines 152 to 172
def allclose(x, y, rel_tol=0.25, abs_tol=1, return_indices=False):
"""
Return True all elements of x and y are within tolerance

If return_indices=True, returns (False, (i, j)) when the arrays are not
close, where i and j are the indices into x and y of corresponding
non-close elements.
"""
for i, j in iter_indices(x.shape, y.shape):
i, j = i.raw, j.raw
a = x[i]
b = y[j]
if not (math.isfinite(a) and math.isfinite(b)):
# TODO: If a and b are both infinite, require the same type of infinity
continue
close = math.isclose(a, b, rel_tol=rel_tol, abs_tol=abs_tol)
if not close:
if return_indices:
return (False, (i, j))
return False
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this and assert_allclose to test_linalg.py for now, as ideally what we'd be doing is standardising these utils across function groups then, and scoping these utils for each function group makes it easier to distinguish where they're used and their subtle differences. (standardising seems quite doable, I just need to get round to it)

Think I'd still like this for now, esp as I'm generally rethinking the pytest helpers for #200/general look at vectorisation. Not blocking tho.

@asmeurer
Copy link
Member Author

I've released ndindex 1.8 with the changes needed here, and updated the pin in requirements.txt.

@honno
Copy link
Member

honno commented Feb 16, 2024

Can ignore flaky statistical tests (skipped in master now since #233). Looks good, just need resolution on the failing test_cross.

@asmeurer
Copy link
Member Author

asmeurer commented Feb 16, 2024

The test failure is because numpy.array_api is wrong for cross (I never added broadcasting support from the 2022 spec). array-api-strict has already been fixed data-apis/array-api-strict#10. We will need to figure out what we want to do going forward for CI tests, given that numpy.array_api is going to be removed and array-api-strict is in its own repo that has its own array-api-tests CI run.

At any rate, I just remembered that I still wanted to check this with numpy, torch, and cupy. Although if you want to merge anyway, you can, and I'll just make any follow up fixes in another PR.

@asmeurer
Copy link
Member Author

asmeurer commented Feb 16, 2024

Looks like the following causes NumPy main to hang

>>> import numpy as np
>>> a = np.empty((3, 3, 0, 0), dtype=np.float32)
>>> np.linalg.cholesky(a, upper=True)

The issue is only there with upper=True, which is new in 2.0.

@asmeurer
Copy link
Member Author

asmeurer commented Feb 16, 2024

Submitted upstream numpy/numpy#25840. What's the easiest way to skip this test for now? (EDIT: looks like -k 'not cholesky' works)

@asmeurer
Copy link
Member Author

There are some other test failures with NumPy:

E           ValueError: solve: Input operand 1 does not have enough dimensions (has 1, gufunc core with signature (m,m),(m,n)->(m,n) requires 2)
E           Falsifying example: test_solve(
E               x1=array([], shape=(0, 0, 0), dtype=float32),
E               x2=array([], dtype=float32),
E           )

(need to investigate)

  • test_matrix_rank fails with
    | Traceback (most recent call last):
    |   File "/Users/aaronmeurer/Documents/array-api-tests/array_api_tests/test_linalg.py", line 429, in test_matrix_rank
    |     linalg.matrix_rank(x, **kw)
    |   File "/Users/aaronmeurer/Documents/numpy/numpy/linalg/_linalg.py", line 2083, in matrix_rank
    |     return count_nonzero(S > tol, axis=-1)
    |                          ^^^^^^^
    | ValueError: operands could not be broadcast together with shapes (3,2) (3,3)
    | Falsifying example: test_matrix_rank(
    |     x=array([[[0., 0.],
    |             [0., 0.]],
    |
    |            [[0., 0.],
    |             [0., 0.]],
    |
    |            [[0., 0.],
    |             [0., 0.]]], dtype=float32),
    |     kw={'rtol': array([0., 0., 0.], dtype=float32)},
    | )
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "/Users/aaronmeurer/Documents/array-api-tests/array_api_tests/test_linalg.py", line 429, in test_matrix_rank
    |     linalg.matrix_rank(x, **kw)
    |   File "/Users/aaronmeurer/Documents/numpy/numpy/linalg/_linalg.py", line 2080, in matrix_rank
    |     tol = S.max(axis=-1, keepdims=True) * rtol
    |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
    | ValueError: operands could not be broadcast together with shapes (0,2,1) (0,2)
    | Falsifying example: test_matrix_rank(
    |     x=array([], shape=(0, 2, 1, 1), dtype=float32),
    |     kw={'rtol': array([], shape=(0, 2), dtype=float32)},
    | )

(also need to investigate)

  • I get a health check error for filtering too much with invertable_matrices from test_inv sometimes.

@honno
Copy link
Member

honno commented Feb 20, 2024

Submitted upstream numpy/numpy#25840. What's the easiest way to skip this test for now? (EDIT: looks like -k 'not cholesky' works)

Looks like it was fixed already 🎉

We will need to figure out what we want to do going forward for CI tests, given that numpy.array_api is going to be removed and array-api-strict is in its own repo that has its own array-api-tests CI run.

I wrote #235 to track this and other musings on our CI. Probably for this PR we're sticking with numpy.array_api.

trace fails because of Reconsider sum/prod/trace upcasting for floating-point dtypes array-api#731. I propose we fix that after this is merged (I have factored out the sum/prod/trace logic into a separate function here)

Yep agreed can fix after, wrote #234 for this.


Regarding test_solve and test_matrix_rank failures (+ test_inv flakiness), yeah if this week you can identify the issues and either fix or slap a pytest.mark.skip if non-trivial, otherwise I think it'd be nice to just merge this PR and look at later.

@asmeurer
Copy link
Member Author

Looks like the numpy solve failures are due to numpy/numpy#15349, which I guess was never addressed for 2.0. This is worked around in array-api-strict, but I never added a workaround to array-api-compat.

@asmeurer
Copy link
Member Author

The np solve issue in array-api-compat is fixed here data-apis/array-api-compat#93. Unfortunately, I don't think we can fix cupy as it doesn't seem to actually support stacking in its solve().

@asmeurer
Copy link
Member Author

Here's the invertable_matrices strategy that is giving the health error for too much filtering:

@composite
def invertible_matrices(draw, dtypes=xps.floating_dtypes(), stack_shapes=shapes()):
# For now, just generate stacks of diagonal matrices.
n = draw(integers(0, SQRT_MAX_ARRAY_SIZE),)
stack_shape = draw(stack_shapes)
d = draw(arrays(dtypes, shape=(*stack_shape, 1, n),
elements=dict(allow_nan=False, allow_infinity=False)))
# Functions that require invertible matrices may do anything when it is
# singular, including raising an exception, so we make sure the diagonals
# are sufficiently nonzero to avoid any numerical issues.
assume(xp.all(xp.abs(d) > 0.5))
diag_mask = xp.arange(n) == xp.reshape(xp.arange(n), (n, 1))
return xp.where(diag_mask, d, xp.zeros_like(d))

Is there a more direct way to generate arrays d s.t. all(abs(d) > 0.5)?

(of course, I should probably come up with some way to generate more interesting invertible matrices than just diagonal ones).

@asmeurer
Copy link
Member Author

Looks like the matrix_rank thing is a legitimate bug in the numpy implementation https://github.com/numpy/numpy/pull/25437/files#r1500043325.

@asmeurer
Copy link
Member Author

@honno let me know what you think of the latest commit as a way to test functions like matmul which should be in both the main and linalg namespace separately. Previously only the linalg one was tested, but I want to make sure that the compat library properly wraps things in both namespaces.

Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know what you think of the latest commit as a way to test functions like matmul which should be in both the main and linalg namespace separately. Previously only the linalg one was tested, but I want to make sure that the compat library properly wraps things in both namespaces.

Yeah looks good, nice catch 👍

Is there a more direct way to generate arrays d s.t. all(abs(d) > 0.5)?

Pushed a change which does this by passing a customer elements strategy.

Lets merge as-is and resolve everything else in separate PRs 🎉

@honno honno merged commit a1d7701 into data-apis:master Feb 26, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants