Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #8477: Allow copies with different strides for 0-length data #8482

Merged
merged 3 commits into from Oct 11, 2022

Conversation

gmarkall
Copy link
Member

@gmarkall gmarkall commented Oct 3, 2022

Since NumPy 1.23, it is possible for zero-length ndarrays to have different strides (e.g. (0,), and (8,)). If we attempt to copy from a zero-length device array to a zero-length host array where the strides differ, our compatibility check fails because it compares strides.

This commit fixes the issue by only considering strides when checking compatibility of nonzero-length arrays. I believe this to be valid because the following works normally with NumPy 1.23:

import numpy as np

ary1 = np.arange(0)
ary2 = np.ndarray((0,), buffer=ary1.data)
ary3 = np.empty_like(ary2)

ary3[:] = ary2
ary3[...] = ary2
np.copyto(ary2, ary3)

i.e. copying zero-length arrays with different strides generally works as expected.

The included test is written in such a way that it should test this change in behaviour regardless of the installed NumPy version - we explicitly construct zero-length device and host arrays with differing strides. The additional sanity check ensures that the host array has the strides we expect, just in case there is some version of NumPy in which setting the strides explicitly didn't result in the expected strides - I have observed that requesting nonzero strides for a zero length array can result still in zero strides (a separate but related behaviour), so this sanity check is provided to account for any other unexpected behaviour of this nature. I have tested locally with NumPy 1.22 and 1.23 (pre- and post-changes to strides).

See also dask/distributed#7089 where a workaround for an observation of this issue was needed. This would not be needed with the fix in this commit.

Fixes #8477.

Since NumPy 1.23, it is possible for zero-length ndarrays to have
different strides (e.g. `(0,)`, and `(8,)`). If we attempt to copy from
a zero-length device array to a zero-length host array where the strides
differ, our compatibility check fails because it compares strides.

This commit fixes the issue by only considering strides when checking
compatibility of nonzero-length arrays. I believe this to be valid
because the following works normally with NumPy 1.23:

```python
import numpy as np

ary1 = np.arange(0)
ary2 = np.ndarray((0,), buffer=ary1.data)
ary3 = np.empty_like(ary2)

ary3[:] = ary2
ary3[...] = ary2
np.copyto(ary2, ary3)
```

i.e. copying zero-length arrays with different strides generally works
as expected.

The included test is written in such a way that it should test this
change in behaviour regardless of the installed NumPy version - we
explicitly construct zero-length device and host arrays with differing
strides. The additional sanity check ensures that the host array has the
strides we expect, just in case there is some version of NumPy in which
setting the strides explicitly didn't result in the expected strides - I
have observed that requesting nonzero strides for a zero length array
can result still in zero strides (a separate but related behaviour), so
this sanity check is provided to account for any other unexpected
behaviour of this nature. I have tested locally with NumPy 1.22 and 1.23
(pre- and post-changes to strides).

See also dask/distributed#7089 where a
workaround for an observation of this issue was needed. This would not
be needed with the fix in this commit.
@gmarkall
Copy link
Member Author

gmarkall commented Oct 3, 2022

gpuci run tests

@gmarkall gmarkall added 2 - In Progress CUDA CUDA related issue/PR Effort - medium Medium size effort needed labels Oct 3, 2022
@gmarkall gmarkall added this to the Numba 0.56.3RC milestone Oct 3, 2022
@gmarkall
Copy link
Member Author

gmarkall commented Oct 3, 2022

gpuci run tests

@gmarkall
Copy link
Member Author

gmarkall commented Oct 3, 2022

Since making this PR I also decided it would be better to test copies in all directions, so the additional commit adds other cases (host-to-device, and device-to-device).

@gmarkall
Copy link
Member Author

gmarkall commented Oct 3, 2022

gpuci run tests

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @gmarkall, this looks like an appropriate fix for the issue described. One minor query on the testing to resolve but otherwise looks good. Thanks again!

if ary1sq.strides != ary2sq.strides:
# We check strides only if the size is nonzero, because strides are
# irrelevant (and can differ) for zero-length copies.
if ary1.size and ary1sq.strides != ary2sq.strides:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: cases like:

ary1 = np.ndarray(shape=(1, 1, 1), strides=(0, 2, 4), dtype=np.float64)
ary2 = np.ndarray(shape=(), strides=(), dtype=np.float64)

are "safe" due to the squeeze above. It shouldn't be possible to reach here and need to check ary2.size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also checked manually with:

from numba import cuda
import numpy as np

x  = np.ndarray(shape = (1, 1, 1), strides = (0, 2, 4), dtype=np.float64)
y  = np.ndarray(shape = (), strides = (), dtype=np.float64)

print(x.shape, x.strides, x.size)
print(y.shape, y.strides, y.size)

dx = cuda.to_device(x)
dy = cuda.to_device(y)

dx.copy_to_host(y)
dy.copy_to_host(x)

without issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for checking this too!

Comment on lines +444 to +456
# Ensure that the copy succeeds in both directions
dev_array.copy_to_host(host_array)
dev_array.copy_to_device(host_array)

# Ensure that a device-to-device copy also succeeds when the strides
# differ - one way of doing this is to copy the host array across and
# use that for copies in both directions.
dev_array_from_host = cuda.to_device(host_array)
self.assertEqual(dev_array_from_host.shape, (0,))
self.assertEqual(dev_array_from_host.strides, (0,))

dev_array.copy_to_device(dev_array_from_host)
dev_array_from_host.copy_to_device(dev_array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's no merit in checking the values of the copies as this is about testing that the copies have succeeded without error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's no merit - I think there's already tests that copies work, this test is focused on an edge case in checking compatibility. Additionally, since these are empty arrays, there's not a lot to look at to check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming, I'm inclined to agree, but thought it best to check!

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Oct 4, 2022
@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Oct 4, 2022
@stuartarchibald stuartarchibald added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Oct 4, 2022
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

@stuartarchibald stuartarchibald added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Oct 4, 2022
@stuartarchibald stuartarchibald self-assigned this Oct 4, 2022
@sklam
Copy link
Member

sklam commented Oct 6, 2022

BFID numba_smoketest_cuda_yaml_161

@gmarkall
Copy link
Member Author

How did the buildfarm run go?

@esc
Copy link
Member

esc commented Oct 11, 2022

How did the buildfarm run go?

It was fine.

@esc esc added BuildFarm Passed For PRs that have been through the buildfarm and passed 5 - Ready to merge Review and testing done, is ready to merge and removed Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm 4 - Waiting on CI Review etc done, waiting for CI to finish labels Oct 11, 2022
@sklam sklam merged commit 03bc80f into numba:main Oct 11, 2022
stuartarchibald pushed a commit to stuartarchibald/numba that referenced this pull request Oct 12, 2022
Fix numba#8477: Allow copies with different strides for 0-length data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge BuildFarm Passed For PRs that have been through the buildfarm and passed CUDA CUDA related issue/PR Effort - medium Medium size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA: Copying zero-size device array to zero-size host array can fail if strides differ
4 participants