-
-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cp.unique without sorting #8307
Comments
Hi @essoca! CuPy's primary aim is to provide NumPy/SciPy compatible APIs, and unique without sorting is currently considered functionality that end users would implement on top of CuPy. We will continue to watch the needs of the community to see if it's better to add support for it as exceptions like these. As for the algorithms proposed, I'm under the impression that it is difficult to naively port it to run efficiently on CUDA. |
Just FYI, the array API standard does not regulate the sort order and it could be unsorted: |
Thanks for the info @leofang. What I don't get is that, if array-API compliant routines are implemented as shortcuts to As to your impression @kmaehashi, I believe that the @numba.cuda.jit('int64(int64[:])', device=True)
def _bin2dec(row):
dec = 0
for j in range(row.size):
dec += row[j] * (2**j)
return dec
@numba.guvectorize('int64[:,:], int64[:]', '(n,m)->(n)', target='cuda')
def _hash2d(arr, out):
for i in range(arr.shape[0]):
out[i] = _bin2dec(arr[i][::-1])
@numba.cuda.jit('int64[::1], int64[::1], int64[::1], int32[::1]')
def _unique1d_cuda_kernel(rows_hashed, ind, inv, count):
i = numba.cuda.grid(1)
if i < rows_hashed.size:
row_idx = rows_hashed[i] % rows_hashed.size
ind[row_idx] = i
inv[i] = row_idx
numba.cuda.atomic.add(count, row_idx, 1) A test for this: import cupy as cp
N, M = 10, 3
arr = cp.random.randint(0, 2, (N, M))
rows_hashed = cp.empty(arr.shape[0], dtype=cp.int64)
_hash2d(arr, rows_hashed)
ind = cp.ones(arr.shape[0], dtype=cp.int64) * N
inv = cp.empty(arr.shape[0], dtype=np.int64)
count = cp.zeros(arr.shape[0], dtype=cp.int32)
_unique1d_cuda_kernel[1, N](rows_hashed, ind, inv, count)
ind = ind[ind != N]
count = count[count != 0]
uniques2, ind2, inv2, count2 = cp.unique(
arr,
return_index=True,
return_inverse=True,
return_counts=True,
axis=0
)
uniques = arr[ind]
uniques2 = uniques2
>>> cp.all(uniques==uniques2).item()
True The complications of generalizing this are in dealing with hash collision when In the most general case, the problem boils down to having |
Sorry I dropped the ball. The rationale is that we (array API standardization committee) did not want to dictate anything that could impede alternative implementations. NumPy has been having the result sorted for years (and CuPy followed) and we (NumPy/CuPy) would not want to break backward compatibility, but on the other hand there are needs (as you showed) for other libraries to do it differently. |
Description
There are use cases where one wants to find unique rows in a 2D array without having to sort the outputs. This is already recognized by the implementation of unique by
pandas
. They claim it to be significantly faster than numpy's for long enough sequences.I wanted to try the differences by implementing such functionality, keeping numpy's API. So I tried:
and then wrap it by
Write a test case:
Asserting equality:
Compare performance:
This implementation is then
~8x
faster than numpy.Since finding unique rows using
cupy
is also of interest (e.g. see the latest post requesting such a feature, which is btw already possible by sorting), I am very interested in the possibility of a faster implementation without sorting. Could you please consider it?Many thanks in advance.
Additional Information
No response
The text was updated successfully, but these errors were encountered: