Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] support of prefiltered brute force #2294

Merged
merged 29 commits into from
May 24, 2024

Conversation

rhdong
Copy link
Member

@rhdong rhdong commented May 7, 2024

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time       33.1 ms         69.9 ms           21 1000000#128#1000#255#0#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time       38.0 ms         74.8 ms           18 1000000#128#1000#255#0#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time       41.7 ms         78.5 ms           17 1000000#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time       57.5 ms         94.3 ms           12 1000000#128#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time       19.7 ms         56.4 ms           35 1000000#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time       26.1 ms         62.8 ms           27 1000000#128#1000#255#0.9#L2Expanded#NO_COPY#SEARCH```

- This PR is one part of the feature of rapidsai#1969
- Add the API of 'search_with_filtering' for brute force.
Authors:
  - James Rong (https://github.com/rhdong)
@rhdong rhdong requested review from a team as code owners May 7, 2024 06:07
@rhdong rhdong requested review from benfred and cjnolet May 7, 2024 06:07
@rhdong rhdong added 3 - Ready for Review feature request New feature or request 4 - Waiting on Reviewer Waiting for reviewer to review or respond non-breaking Non-breaking change labels May 7, 2024
@rhdong
Copy link
Member Author

rhdong commented May 8, 2024

Hey @cjnolet @benfred, I implemented an initial version of faster_dot (without deep optimization), which could help improve performance in many classic scenarios. I will continue to communicate with cuSparse Team. Here is the benchmark:

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------------------------
Benchmark                                                    cuSparseSDDMM     faster_dot   Iterations Configuration(No filtered means normal `search`)
------------------------------------------------------------------------------------------------------n_sample#dim#n_query#top_k#remove rate# metric
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time        11.4 ms        11.4 ms           61 1000000#4096#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time        11.5 ms        11.4 ms           61 1000000#4096#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time         137 ms        7.02 ms            4 1000000#4096#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time         137 ms        7.03 ms            5 1000000#4096#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time        70.3 ms        4.72 ms           10 1000000#4096#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time        70.2 ms        4.72 ms           10 1000000#4096#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/6/0/0/manual_time        8.94 ms        2.05 ms           77 1000000#4096#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/7/0/0/manual_time        8.97 ms        2.06 ms           77 1000000#4096#1#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/8/0/0/manual_time        17.1 ms        17.1 ms           41 1000000#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/9/0/0/manual_time        17.2 ms        17.2 ms           41 1000000#4096#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/10/0/0/manual_time        149 ms        36.0 ms            5 1000000#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/11/0/0/manual_time        149 ms        36.1 ms            5 1000000#4096#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/12/0/0/manual_time       76.0 ms        19.2 ms            9 1000000#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/13/0/0/manual_time       76.2 ms        19.2 ms            9 1000000#4096#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/14/0/0/manual_time       9.43 ms        3.39 ms           72 1000000#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/15/0/0/manual_time       9.45 ms        3.40 ms           72 1000000#4096#10#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/16/0/0/manual_time        488 ms         489 ms            2 1000000#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/17/0/0/manual_time        495 ms         494 ms            2 1000000#4096#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/18/0/0/manual_time       1916 ms        3277 ms            1 1000000#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/19/0/0/manual_time       1915 ms        3280 ms            1 1000000#4096#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/20/0/0/manual_time        968 ms        1641 ms            1 1000000#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/21/0/0/manual_time        969 ms        1643 ms            1 1000000#4096#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/22/0/0/manual_time        108 ms         167 ms            6 1000000#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/23/0/0/manual_time        108 ms         167 ms            6 1000000#4096#1000#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/24/0/0/manual_time       2.64 ms        2.64 ms          265 1000000#512#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/25/0/0/manual_time       2.65 ms        2.65 ms          264 1000000#512#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/26/0/0/manual_time       21.4 ms        4.84 ms           32 1000000#512#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/27/0/0/manual_time       21.4 ms        4.84 ms           33 1000000#512#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/28/0/0/manual_time       12.0 ms        3.60 ms           57 1000000#512#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/29/0/0/manual_time       12.0 ms        3.62 ms           57 1000000#512#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/30/0/0/manual_time       2.83 ms        1.92 ms          246 1000000#512#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/31/0/0/manual_time       2.84 ms        1.94 ms          245 1000000#512#1#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/32/0/0/manual_time       3.57 ms        3.57 ms          196 1000000#512#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/33/0/0/manual_time       3.63 ms        3.63 ms          193 1000000#512#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/34/0/0/manual_time       22.3 ms        14.4 ms           31 1000000#512#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/35/0/0/manual_time       22.3 ms        14.4 ms           31 1000000#512#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/36/0/0/manual_time       12.6 ms        8.33 ms           55 1000000#512#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/37/0/0/manual_time       12.6 ms        8.35 ms           55 1000000#512#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/38/0/0/manual_time       2.80 ms        2.28 ms          249 1000000#512#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/39/0/0/manual_time       2.81 ms        2.29 ms          247 1000000#512#10#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/40/0/0/manual_time       75.0 ms        74.5 ms            9 1000000#512#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/41/0/0/manual_time       79.9 ms        79.7 ms            9 1000000#512#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/42/0/0/manual_time        210 ms        1085 ms            3 1000000#512#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/43/0/0/manual_time        215 ms        1088 ms            3 1000000#512#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/44/0/0/manual_time        105 ms         543 ms            7 1000000#512#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/45/0/0/manual_time        106 ms         544 ms            7 1000000#512#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/46/0/0/manual_time       12.6 ms        56.6 ms           55 1000000#512#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/47/0/0/manual_time       12.7 ms        56.8 ms           54 1000000#512#1000#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/48/0/0/manual_time       1.72 ms        1.71 ms          407 1000000#128#1#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/49/0/0/manual_time       1.73 ms        1.72 ms          406 1000000#128#1#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/50/0/0/manual_time       8.86 ms        3.99 ms           79 1000000#128#1#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/51/0/0/manual_time       8.87 ms        4.00 ms           79 1000000#128#1#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/52/0/0/manual_time       5.64 ms        3.19 ms          123 1000000#128#1#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/53/0/0/manual_time       5.62 ms        3.19 ms          120 1000000#128#1#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/54/0/0/manual_time       2.15 ms        1.87 ms          326 1000000#128#1#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/55/0/0/manual_time       2.16 ms        1.89 ms          323 1000000#128#1#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/56/0/0/manual_time       2.13 ms        2.12 ms          328 1000000#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/57/0/0/manual_time       2.19 ms        2.18 ms          320 1000000#128#10#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/58/0/0/manual_time       9.41 ms        6.07 ms           75 1000000#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/59/0/0/manual_time       9.44 ms        6.09 ms           74 1000000#128#10#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/60/0/0/manual_time       5.94 ms        4.16 ms          114 1000000#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/61/0/0/manual_time       5.97 ms        4.18 ms          113 1000000#128#10#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/62/0/0/manual_time       2.10 ms        1.86 ms          332 1000000#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/63/0/0/manual_time       2.11 ms        1.87 ms          331 1000000#128#10#255#0.99#L2Expanded#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/64/0/0/manual_time       33.1 ms        33.1 ms           21 1000000#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/65/0/0/manual_time       38.0 ms        38.0 ms           18 1000000#128#1000#255#[No filtered]#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/66/0/0/manual_time       54.2 ms         244 ms           13 1000000#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/67/0/0/manual_time       57.4 ms         247 ms           12 1000000#128#1000#255#0.8#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/68/0/0/manual_time       24.6 ms         121 ms           29 1000000#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/69/0/0/manual_time       26.1 ms         123 ms           27 1000000#128#1000#255#0.9#L2Expanded#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/70/0/0/manual_time       4.69 ms        14.4 ms          149 1000000#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/71/0/0/manual_time       4.83 ms        14.5 ms          138 1000000#128#1000#255#0.99#L2Expanded#NO_COPY#SEARCH

@cjnolet
Copy link
Member

cjnolet commented May 8, 2024

@rhdong these benchmarks look promising so far. Can you run these for the following data shapes and also provide the standard (non-filtered) brute-force knn numbers for reference, please?

Num vectors: [100k, 1M, 10M]
Dimensions: [128, 256, 768, 1024, 2048, 4096]

Can you also see how far we can take the sparsity? It would be helpful to know what the results look like for 60%, 70%, and 80% sparsity.

From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?

@rhdong
Copy link
Member Author

rhdong commented May 8, 2024

From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?

Hi @cjnolet , this is the benchmark for end-2-end, not only for the new kernel. So, maybe the chart is not so clear, the conclusion should be the performance decreasing with n_query increasing. This new kernel is not deeply optimized. I'm not sure if the new kernel can cover all of the scenarios; we'd better determine the typical scenarios and use appropriate methods.

@rhdong
Copy link
Member Author

rhdong commented May 13, 2024

@rhdong these benchmarks look promising so far. Can you run these for the following data shapes and also provide the standard (non-filtered) brute-force knn numbers for reference, please?

Num vectors: [100k, 1M, 10M] Dimensions: [128, 256, 768, 1024, 2048, 4096]

Can you also see how far we can take the sparsity? It would be helpful to know what the results look like for 60%, 70%, and 80% sparsity.

From what I can see in your benchmarks above, it appears as though the performance degrades significantly for your new kernel as k grows, but shouldn't the k-selection be a separate operation altogether? Do you have any ideas or insights into this behavior?

Hello @cjnolet , here is the latest benchmark, I improved the new kernel a little bit, it has more stable performance(without that much strange performance gap) and no need to consume extra buffer(that would might cause the OOM on some lower sparsity like cuSparseSDDMM). But its performance in some cases(roughly 30-40%) is not better than cuSparseSDDMM. I intend to merge them by a branch. Before that, I'd like to hear your advice, thank you!
By the way, I think it would be a tight timeline to introduce the masked 1nn with a because the API inputs have a little difference that needs to be adapted. So, may I introduce it in the next release?

-------------------------------------------------------------------------------------------------------
Benchmark                                                    cuSparseSDDMM     faster_dot         Iterations
-------------------------------------------------------------------------------------------------------
KNN/float/int64_t/brute_force_filter_knn/0/0/0/manual_time        0.195 ms      0.193 ms         3584 10240#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/1/0/0/manual_time        0.227 ms      0.274 ms         3083 10240#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/2/0/0/manual_time        0.208 ms      0.239 ms         3364 10240#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/3/0/0/manual_time        0.184 ms      0.203 ms         3810 10240#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/4/0/0/manual_time        0.727 ms      0.724 ms          963 10240#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/5/0/0/manual_time         1.68 ms      0.635 ms          416 10240#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/6/0/0/manual_time        0.866 ms      0.396 ms          808 10240#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/7/0/0/manual_time        0.208 ms      0.152 ms         3365 10240#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/8/0/0/manual_time        0.265 ms      0.260 ms         2640 10240#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/9/0/0/manual_time        0.265 ms      0.352 ms         2637 10240#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/10/0/0/manual_time       0.242 ms      0.292 ms         2896 10240#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/11/0/0/manual_time       0.210 ms      0.244 ms         3325 10240#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/12/0/0/manual_time       0.915 ms      0.913 ms          764 10240#256#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/13/0/0/manual_time        2.96 ms      0.918 ms          237 10240#256#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/14/0/0/manual_time        1.49 ms      0.520 ms          469 10240#256#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/15/0/0/manual_time       0.289 ms      0.188 ms         2426 10240#256#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/16/0/0/manual_time       0.315 ms      0.313 ms         2218 10240#768#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/17/0/0/manual_time       0.323 ms      0.573 ms         2180 10240#768#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/18/0/0/manual_time       0.282 ms      0.422 ms         2475 10240#768#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/19/0/0/manual_time       0.227 ms      0.309 ms         3083 10240#768#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/20/0/0/manual_time        1.53 ms       1.53 ms          456 10240#768#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/21/0/0/manual_time        6.91 ms       1.53 ms          101 10240#768#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/22/0/0/manual_time        3.58 ms      0.944 ms          195 10240#768#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/23/0/0/manual_time       0.500 ms      0.262 ms         1398 10240#768#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/24/0/0/manual_time       0.329 ms      0.327 ms         2127 10240#1024#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/25/0/0/manual_time       0.331 ms      0.664 ms         2114 10240#1024#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/26/0/0/manual_time       0.286 ms      0.474 ms         2433 10240#1024#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/27/0/0/manual_time       0.230 ms      0.332 ms         3037 10240#1024#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/28/0/0/manual_time        1.82 ms       1.82 ms          385 10240#1024#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/29/0/0/manual_time        7.98 ms       1.85 ms           88 10240#1024#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/30/0/0/manual_time        4.15 ms       1.13 ms          169 10240#1024#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/31/0/0/manual_time       0.563 ms      0.293 ms         1245 10240#1024#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/32/0/0/manual_time       0.381 ms      0.382 ms         1840 10240#2048#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/33/0/0/manual_time       0.375 ms       1.03 ms         1865 10240#2048#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/34/0/0/manual_time       0.315 ms      0.683 ms         2224 10240#2048#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/35/0/0/manual_time       0.235 ms      0.426 ms         2980 10240#2048#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/36/0/0/manual_time        2.96 ms       2.97 ms          236 10240#2048#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/37/0/0/manual_time        12.0 ms       3.07 ms           58 10240#2048#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/38/0/0/manual_time        6.44 ms       1.95 ms          109 10240#2048#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/39/0/0/manual_time       0.902 ms      0.505 ms          777 10240#2048#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/40/0/0/manual_time       0.480 ms      0.482 ms         1455 10240#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/41/0/0/manual_time       0.471 ms       1.77 ms         1487 10240#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/42/0/0/manual_time       0.375 ms       1.08 ms         1868 10240#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/43/0/0/manual_time       0.247 ms      0.614 ms         2841 10240#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/44/0/0/manual_time        5.25 ms       5.27 ms          133 10240#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/45/0/0/manual_time        22.6 ms       5.51 ms           31 10240#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/46/0/0/manual_time        11.5 ms       3.93 ms           61 10240#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/47/0/0/manual_time        1.53 ms      0.925 ms          458 10240#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/48/0/0/manual_time        2.16 ms       2.16 ms          324 1048576#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/49/0/0/manual_time        5.68 ms       9.81 ms          123 1048576#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/50/0/0/manual_time        4.01 ms       6.22 ms          175 1048576#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/51/0/0/manual_time        1.92 ms       2.19 ms          365 1048576#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/52/0/0/manual_time        34.5 ms       34.5 ms           20 1048576#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/53/0/0/manual_time         241 ms       58.0 ms            3 1048576#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/54/0/0/manual_time         119 ms       26.3 ms            6 1048576#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/55/0/0/manual_time        14.2 ms       4.90 ms           49 1048576#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/56/0/0/manual_time        2.69 ms       2.70 ms          260 1048576#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/57/0/0/manual_time        7.30 ms       14.3 ms           96 1048576#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/58/0/0/manual_time        4.84 ms       8.52 ms          145 1048576#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/59/0/0/manual_time        2.00 ms       2.43 ms          349 1048576#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/60/0/0/manual_time        48.8 ms       48.9 ms           14 1048576#256#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/61/0/0/manual_time         421 ms        106 ms            2 1048576#256#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/62/0/0/manual_time         210 ms       51.4 ms            3 1048576#256#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/63/0/0/manual_time        23.4 ms       7.56 ms           30 1048576#256#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/64/0/0/manual_time        4.64 ms       4.64 ms          151 1048576#768#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/65/0/0/manual_time        11.8 ms       32.4 ms           59 1048576#768#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/66/0/0/manual_time        7.13 ms       17.9 ms           98 1048576#768#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/67/0/0/manual_time        2.24 ms       3.42 ms          312 1048576#768#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/68/0/0/manual_time         106 ms        106 ms            7 1048576#768#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/69/0/0/manual_time         738 ms        353 ms            1 1048576#768#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/70/0/0/manual_time         426 ms        178 ms            2 1048576#768#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/71/0/0/manual_time        46.5 ms       20.6 ms           15 1048576#768#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/72/0/0/manual_time        5.68 ms       5.67 ms          123 1048576#1024#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/73/0/0/manual_time        13.4 ms       41.7 ms           52 1048576#1024#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/74/0/0/manual_time        7.89 ms       22.6 ms           89 1048576#1024#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/75/0/0/manual_time        2.31 ms       3.92 ms          304 1048576#1024#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/76/0/0/manual_time         136 ms        135 ms            5 1048576#1024#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/77/0/0/manual_time         996 ms        483 ms            1 1048576#1024#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/78/0/0/manual_time         514 ms        239 ms            1 1048576#1024#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/79/0/0/manual_time        54.0 ms       27.5 ms           13 1048576#1024#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/80/0/0/manual_time        9.63 ms       9.62 ms           73 1048576#2048#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/81/0/0/manual_time        18.4 ms       79.2 ms           38 1048576#2048#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/82/0/0/manual_time        10.5 ms       41.5 ms           67 1048576#2048#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/83/0/0/manual_time        2.57 ms       5.88 ms          272 1048576#2048#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/84/0/0/manual_time         262 ms        262 ms            3 1048576#2048#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/85/0/0/manual_time        1591 ms        986 ms            1 1048576#2048#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/86/0/0/manual_time         793 ms        487 ms            1 1048576#2048#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/87/0/0/manual_time        81.3 ms       55.6 ms            9 1048576#2048#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/88/0/0/manual_time        17.7 ms       17.7 ms           40 1048576#4096#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/89/0/0/manual_time        28.8 ms        156 ms           24 1048576#4096#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/90/0/0/manual_time        15.7 ms       79.9 ms           44 1048576#4096#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/91/0/0/manual_time        3.11 ms       9.80 ms          225 1048576#4096#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/92/0/0/manual_time         509 ms        509 ms            1 1048576#4096#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/93/0/0/manual_time        2687 ms       2018 ms            1 1048576#4096#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/94/0/0/manual_time        1343 ms       1022 ms            1 1048576#4096#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/95/0/0/manual_time         136 ms        114 ms            5 1048576#4096#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/96/0/0/manual_time        19.7 ms       19.7 ms           36 10485760#128#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/97/0/0/manual_time        55.8 ms       97.2 ms           13 10485760#128#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/98/0/0/manual_time        38.9 ms       60.3 ms           18 10485760#128#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/99/0/0/manual_time        16.7 ms       19.0 ms           42 10485760#128#10#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/100/0/0/manual_time        342 ms        342 ms            2 10485760#128#1000#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/101/0/0/manual_time       2473 ms        914 ms            1 10485760#128#1000#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/102/0/0/manual_time       1237 ms        456 ms            1 10485760#128#1000#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/103/0/0/manual_time        141 ms       61.8 ms            5 10485760#128#1000#255#0.99#InnerProduct#NO_COPY#SEARCH

KNN/float/int64_t/brute_force_filter_knn/104/0/0/manual_time       24.9 ms       24.9 ms           28 10485760#256#10#255#[No filtered]#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/105/0/0/manual_time       73.1 ms        143 ms           10 10485760#256#10#255#0.8#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/106/0/0/manual_time       47.5 ms       83.5 ms           15 10485760#256#10#255#0.9#InnerProduct#NO_COPY#SEARCH
KNN/float/int64_t/brute_force_filter_knn/107/0/0/manual_time       17.5 ms       25.1 ms           40 10485760#256#10#255#0.99#InnerProduct#NO_COPY#SEARCH

The following cases are still running.

value_t l_dot_ = 0.0;
#pragma unroll
for (value_idx k = vec_id; k < dim; k += blockDim.x) {
asm("prefetch.global.L2 [%0];" ::"l"(B_col + k + blockDim.x));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @achirkin, I guess this symbol checking it concerns here, what should I do to keep the same function (this is very useful for better performance.)

raft::detail::popc(res, filter_view, n_queries * n_dataset, nnz_view);
raft::copy(&nnz_h, nnz.data(), 1, stream);

resource::sync_stream(res, stream);
Copy link
Member

@cjnolet cjnolet May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small note, and this applies for the whole PR- it'll be a lot easier to migrate this code to cuVS if you always use absolute namespaces (e..g raft::resource::sync_stream). It's usually a simple find/replace but sometimes it can get tricky when, for example RAFT has some overloads of existing intrinsics and it's not immediately obvious the intrinsic needs the raft namespace added. In that case, the code will work in RAFT but you'll get a compile error in cuVS that can be hard to debug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

@@ -25,12 +28,18 @@
#include <raft/distance/distance.cuh>
#include <raft/distance/distance_types.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up, this becomes distance/distance.hpp in cuvs.

@cjnolet cjnolet added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 21, 2024
@@ -334,6 +334,14 @@ if(RAFT_COMPILE_LIBRARY)
src/matrix/detail/select_k_float_int32.cu
src/matrix/detail/select_k_half_int64_t.cu
src/matrix/detail/select_k_half_uint32_t.cu
src/sparse/matrix/detail/select_k_half_uint32_t.cu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're removing libraft.so in a future version, so I think we probably want to avoid adding these instantiations.

@@ -16,7 +16,7 @@

#pragma once

#include <raft/core/bitset.cuh>
#include <raft/core/bitmap.hpp>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @cjnolet @benfred , inspired by the PR of @lowener #2295, I split the bitmap to avoid the compilation issue caused by *.hpp files. If we have a better method to resolve this, please let me know; many thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! Just make sure to keep all cuda stuff out of the hpp file.

Copy link
Contributor

@lowener lowener May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitmap.cuh should include bitset.cuh since it is inheriting from the bitset class. This will avoid having undefined references for n_elements() for example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitmap.cuh should include bitset.cuh since it is inheriting from the bitset class. This will avoid having undefined references for n_elements() for example

Thank you @lowener , very make sense, just fix it!

@cjnolet cjnolet removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 23, 2024
value_t* s_A = (value_t*)smem;
value_idx cur_row = -1;

#pragma unroll
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified in the underlying SASS that this pragma unroll is doing anything? This does a static analysis when it's compiled and relies on compile-time constants to pull the runtime loops out into independent compiled loops. When the number of iterations is a runtime value, this will be silently skipped.

Copy link
Member

@cjnolet cjnolet May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up question- why are we iterating through all of the rows in each block? Shouldn't each block or warp be launched on an independent row? Otherwise, every block is going to duplicate the work, isn't it?

Edit: I should have looked more closely where you are launching the kernel- I see this is actually chunks of rows. To ease future eyes from making this mistake, can you please rename nrows to be something more specific? Even something like nrows_in_chunk or nrows_for_block would be more readable.

Another thought- converting the CSR to a COO would be a more efficient way to do this, especially for CSR matrices of high sparsity- that way you could just have the kernel go through the edge list element-wise and perform the needed dots. Each item in the bitmap would be 1 dot product, thus you could be guaranteeing more uniformly distributed chunks by chunking over the COO instead of rows of the CSR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I will remove some unuseful unroll, and I did try a COO version, but it is a little low efficient because of this memory-intensive kernel. The COO will increase the memory reading quantity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I did try a COO version, but it is a little low efficient because of this memory-intensive kernel. The COO will increase the memory reading quantity

That's not true, COO is for load balancing. Let's revisit this after 24.08.

Copy link
Member

@divyegala divyegala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ maintainability review

template <typename value_t, typename index_t>
void popc(const raft::resources& res,
device_vector_view<value_t, index_t> values,
index_t max_len,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a raft::host_scalar_view<index_t> instead for consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK for me, but not so sure, how to judge a primitive type need to be raft::host_scalar_view<index_t>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just the API consistency that we are going for. If it's an array, use vector/matrix_view and if it's a scalar use scalar_view.

cc @cjnolet for his opinion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think primitive types are okay for values that might never need to originate or be returned on device, as opposed to host.

@@ -68,7 +68,7 @@ RAFT_KERNEL __launch_bounds__(calc_nnz_by_rows_tpb) calc_nnz_by_rows_kernel(cons

while (offset < num_cols) {
index_t bitmap_idx = lane_id + (s_bit + offset) / BITS_PER_BITMAP;
bitmap_t l_bitmap = bitmap_t(0);
typename std::remove_const<bitmap_t>::type l_bitmap = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typename std::remove_const<bitmap_t>::type l_bitmap = 0;
std::remove_const_t<bitmap_t> l_bitmap = 0;

cpp/include/raft/sparse/convert/detail/bitmap_to_csr.cuh Outdated Show resolved Hide resolved
@cjnolet
Copy link
Member

cjnolet commented May 24, 2024

/merged

@benfred
Copy link
Member

benfred commented May 24, 2024

/merge

@rapids-bot rapids-bot bot merged commit 5f0dfed into rapidsai:branch-24.06 May 24, 2024
69 checks passed
rapids-bot bot pushed a commit to rapidsai/cuvs that referenced this pull request May 29, 2024
- The PR is one part of prefiltered brute force and should work with the PR of raft: rapidsai/raft#2294

Authors:
  - rhdong (https://github.com/rhdong)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #146
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review 4 - Waiting on Reviewer Waiting for reviewer to review or respond CMake cpp feature request New feature or request non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants