New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add MST optimization to improve the connectivity of CAGRA graphs. #2218

Draft

anaruse wants to merge 10 commits into rapidsai:branch-24.06 from anaruse:MST_optimization

Contributor

anaruse commented Mar 7, 2024

Close #2217

Improve connectivity of CAGRA graphs using approximate degree constrained MST (Minimum Spanning Tree).

anaruse added 6 commits

March 5, 2024 01:46


          Added CPU implementation of graph optimization with MST (WIP)

fd132e8


          [MST optimization] Added code to check number of incoming edges

97c62b2


          [MST optimization] Added code to reduce the number of edges in the ap…

f21abce

…proximate MST graph as much as possible


          [MST optimization] Added GPU support

fe98329


          Merge branch 'branch-24.04' into MST_optimization

9357a4e


          Satisfy pre-commit

180600e

anaruse requested a review from a team as a code owner

March 7, 2024 11:07

copy-pr-bot bot commented Mar 7, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions bot added the cpp label

anaruse marked this pull request as draft

March 7, 2024 11:07

anaruse added 3 commits

March 25, 2024 19:00


          [MST optimization] Made it buildable with older version of CUDA

1c53868


          Merge branch 'branch-24.04' into MST_optimization

7cd6f0e


          (Fix an issue of wrong variable name)

440d97d

anaruse force-pushed the MST_optimization branch from 4e787d4 to 440d97d Compare

March 26, 2024 12:46

cjnolet added improvement non-breaking labels

cjnolet assigned anaruse

cjnolet changed the base branch from branch-24.04 to branch-24.06

April 10, 2024 15:31


          Merge branch 'branch-24.06' into MST_optimization

99933cb

Member

cjnolet commented Apr 10, 2024

/ok to test

tfeher requested changes

View reviewed changes

Contributor

tfeher left a comment

Thanks Akira for this PR! Looks good overall, but it would be good to add some more comments and simplify at a few places, so that the code will be easier to maintain. I have some questions and suggestion along these lines.

cpp/include/raft/neighbors/cagra.cuh

@@ @@ -224,9 +224,10 @@ template <typename IdxT = uint32_t, @@
                           host_device_accessor<std::experimental::default_accessor<IdxT>, memory_type::host>>
               void optimize(raft::resources const& res,
                             mdspan<IdxT, matrix_extent<int64_t>, row_major, g_accessor> knn_graph,
-                            raft::host_matrix_view<IdxT, int64_t, row_major> new_graph)
+                            raft::host_matrix_view<IdxT, int64_t, row_major> new_graph,
+                            const bool use_MST = false)

Contributor

tfeher Apr 12, 2024 •

edited

Why not enable by default, is it costly to run MST, or can it lead to any issues?

Member

cjnolet Apr 17, 2024

Running an initial mst on a knn graph for the purposes of finding connected components should be really fast.

I also wonder if we should be exposing the fact that we are running an MST to guarantee connectivity- that's an implementation detail AFAIC. I suggest we instead name this argument something along the lines of "guarantee_connectivity".

cpp/include/raft/neighbors/detail/cagra/cagra_search.cuh

@@ @@ -90,12 +90,12 @@ void search_main_core( @@
                 CagraSampleFilterT sample_filter = CagraSampleFilterT())
               {
                 RAFT_LOG_DEBUG("# dataset size = %lu, dim = %lu\n",

Contributor

tfeher Apr 12, 2024

Suggested change

      
              RAFT_LOG_DEBUG("# dataset size = %lu, dim = %lu\n",
          
              RAFT_LOG_DEBUG("# graph size = %lu, dim = %lu\n",

cpp/include/raft/neighbors/detail/cagra/graph_core.cuh

Comment on lines +1061 to +1062

		raft::make_host_matrix_view<IdxT, int64_t>(input_graph_ptr, graph_size, input_graph_degree),
		raft::make_host_matrix_view<IdxT, int64_t>(output_graph_ptr, graph_size, output_graph_degree),

Contributor

tfeher Apr 12, 2024

Can't we just get a view directly?

Suggested change

      
                  raft::make_host_matrix_view<IdxT, int64_t>(input_graph_ptr, graph_size, input_graph_degree),
          
                  raft::make_host_matrix_view<IdxT, int64_t>(output_graph_ptr, graph_size, output_graph_degree),
          
                 knn_graph.view(),
          
                 new_graph.view(),

cpp/include/raft/neighbors/detail/cagra/graph_core.cuh

@@ @@ -316,12 +571,468 @@ void sort_knn_graph(raft::resources const& res, @@
                 RAFT_LOG_DEBUG("# Sorting kNN graph time: %.1lf sec\n", time_sort_end - time_sort_start);
               }
+              template <typename IdxT = uint32_t>
+              void mst_optimization(raft::resources const& res,

Contributor

tfeher Apr 12, 2024 •

edited

It would be nice to add a short docstring that describes what this optimization step does, and what approximations we have. Something along the lines:

We have a kNN graph as input
We have an approximate MST as output. Approximate, because the number of outgoing edges are limited.
Additionally the construction method is an approximation (seems similar to Kruskal's algorithm, but we solve in parallel which does not guarantee a global minimum).
If the input kNN graph is disconnected then random connection is added to the largest cluster.

The last point is confusing: why do we need an MST, if we just add random edges to the main cluster? (After reading further down how the MST is used, it becomes clearer: if we want to keep a set of edges that guarantees connectivity, then it is better to keep the MST.

Member

cjnolet Apr 17, 2024

Side note- we have an mst implementation in RAFT already but it does not preserve the fixed degree constraint. MST is inherently coupled to neighborhood methods and we are planning to move the existing MST over to cuVS. It would be nice if we could also expose a public API for this MST implementation, since it's generally useful for cases like this where we want to guarantee (or increase) connectedness.

We do this trick in our single-linkage clustering algorithm also, with the only caveat that we need to guarantee the smallest edges are added by the MST because the single-linkage has an exact solution.

cpp/include/raft/neighbors/detail/cagra/graph_core.cuh

+              #pragma omp parallel for
+                for (uint64_t i = 0; i < graph_size; i++) {
+                  for (uint64_t k = 0; k < mst_graph_degree; k++) {
+                    mst_graph_ptr[(mst_graph_degree * i) + k] = graph_size;

Contributor

tfeher Apr 12, 2024

We have an array access operator that makes this simpler

Suggested change

      
                  mst_graph_ptr[(mst_graph_degree * i) + k] = graph_size;
          
                  mst_graph(i, k) = graph_size;

cpp/include/raft/neighbors/detail/cagra/graph_core.cuh

+                    num_direct    = stats_ptr[1];
+                    num_alternate = stats_ptr[2];
+                    num_failure   = stats_ptr[3];
+                  } else {

Contributor

tfeher Apr 16, 2024

It would improve readability if all the use_gpu==false branches are factored into a separate function.

cpp/include/raft/neighbors/detail/cagra/graph_core.cuh

Comment on lines +848 to +863

+                  uint32_t flag_update = 1;
+                  while (flag_update) {
+                    flag_update = 0;
+                    if (use_gpu) {
+                      stats_ptr[0] = flag_update;
+                      raft::copy(d_stats_ptr, stats_ptr, 1, resource::get_cuda_stream(res));
+                      constexpr uint64_t n_threads = 256;
+                      const dim3 threads(n_threads, 1, 1);
+                      const dim3 blocks((graph_size + n_threads - 1) / n_threads, 1, 1);
+                      kern_mst_opt_labeling<<<blocks, threads, 0, resource::get_cuda_stream(res)>>>(
+                        d_label_ptr, d_mst_graph_ptr, graph_size, mst_graph_degree, d_stats_ptr);
+                      raft::copy(stats_ptr, d_stats_ptr, 1, resource::get_cuda_stream(res));
+                      resource::sync_stream(res);
+                      flag_update = stats_ptr[0];

Contributor

tfeher Apr 16, 2024

This is very similar to the existing weak_cc function

raft/cpp/include/raft/sparse/csr.hpp

Line 147 in e977d2e

* @brief Compute weakly connected components. Note that the resulting labels

cpp/include/raft/neighbors/detail/cagra/graph_core.cuh

    
                  for (uint64_t k = 0; k < output_graph_degree; k++) {

              #pragma omp parallel for

                    for (uint64_t i = 0; i < graph_size; i++) {

                      dest_nodes.data_handle()[i] = output_graph_ptr[k + (output_graph_degree * i)];

                      dest_nodes_ptr[i] = pruned_graph_ptr[k + (output_graph_degree * i)];

Contributor

tfeher Apr 16, 2024

Instead of introducing extra pointers, one could use the array access operator (once compiler is finished, it will boil down to the same pointer arithmetic)

Suggested change

      
                    dest_nodes_ptr[i] = pruned_graph_ptr[k + (output_graph_degree * i)];
          
                    dest_nodes(i) = pruned_graph(i, k);

cpp/include/raft/neighbors/detail/cagra/graph_core.cuh

Comment on lines +1276 to +1310

+                    auto my_fwd_graph = pruned_graph_ptr + (output_graph_degree * i);
+                    auto my_rev_graph = rev_graph_ptr + (output_graph_degree * i);
+                    auto my_out_graph = output_graph_ptr + (output_graph_degree * i);
+                    uint32_t kf       = 0;
+                    uint32_t kr       = 0;
+                    uint32_t k        = mst_graph_num_edges_ptr[i];
+                    while (kf < output_graph_degree || kr < output_graph_degree) {
+                      if (kf < output_graph_degree) {
+                        if (my_fwd_graph[kf] < graph_size) {
+                          auto flag_match = false;
+                          for (uint32_t kk = 0; kk < k; kk++) {
+                            if (my_out_graph[kk] == my_fwd_graph[kf]) {
+                              flag_match = true;
+                              break;
+                            }
+                          }
+                          if (!flag_match) {
+                            my_out_graph[k] = my_fwd_graph[kf];
+                            k += 1;
+                          }
+                        }
+                        kf += 1;
+                        if (k >= output_graph_degree) break;
+                      }
+                      if (kr < output_graph_degree) {
+                        if (my_rev_graph[kr] < graph_size) {
+                          auto flag_match = false;
+                          for (uint32_t kk = 0; kk < k; kk++) {
+                            if (my_out_graph[kk] == my_rev_graph[kr]) {
+                              flag_match = true;
+                              break;
+                            }
+                          }
+                          if (!flag_match) {
+                            my_out_graph[k] = my_rev_graph[kr];

Contributor

tfeher Apr 16, 2024

Would be more concise with mdspan array access operator, and a helper for finding duplicate edges:

Suggested change

      
                  auto my_fwd_graph = pruned_graph_ptr + (output_graph_degree * i);
          
                  auto my_rev_graph = rev_graph_ptr + (output_graph_degree * i);
          
                  auto my_out_graph = output_graph_ptr + (output_graph_degree * i);
          
                  uint32_t kf       = 0;
          
                  uint32_t kr       = 0;
          
                  uint32_t k        = mst_graph_num_edges_ptr[i];
          
                  while (kf < output_graph_degree || kr < output_graph_degree) {
          
                    if (kf < output_graph_degree) {
          
                      if (my_fwd_graph[kf] < graph_size) {
          
                        auto flag_match = false;
          
                        for (uint32_t kk = 0; kk < k; kk++) {
          
                          if (my_out_graph[kk] == my_fwd_graph[kf]) {
          
                            flag_match = true;
          
                            break;
          
                          }
          
                        }
          
                        if (!flag_match) {
          
                          my_out_graph[k] = my_fwd_graph[kf];
          
                          k += 1;
          
                        }
          
                      }
          
                      kf += 1;
          
                      if (k >= output_graph_degree) break;
          
                    }
          
                    if (kr < output_graph_degree) {
          
                      if (my_rev_graph[kr] < graph_size) {
          
                        auto flag_match = false;
          
                        for (uint32_t kk = 0; kk < k; kk++) {
          
                          if (my_out_graph[kk] == my_rev_graph[kr]) {
          
                            flag_match = true;
          
                            break;
          
                          }
          
                        }
          
                        if (!flag_match) {
          
                          my_out_graph[k] = my_rev_graph[kr];
          
            /*
          
             * template <typename IdxT>
          
             * RAFT_INLINE_FUNCTION bool check_duplicate_edge(IdxT* neighbors_ptr, uint32_t k, IdxT node_id)
          
             * {
          
             *   for (uint32_t i = 0; i < k; i++) 
          
             *     if (neighbors_ptr[i] == node_id)  return true;
          
             *   return false;
          
             * }
          
            */
          
                  uint32_t kf       = 0;
          
                  uint32_t kr       = 0;
          
                  uint32_t k        = mst_graph_num_edges_ptr[i];
          
                  while (kf < output_graph_degree || kr < output_graph_degree) {
          
                    if (kf < output_graph_degree) {
          
                      if (pruned_graph(i, kf) < graph_size) {
          
                        if (!check_duplicate_edges(&output_graph(i,0), k, pruned_graph(i, kf)) {
          
                          output_graph(i, k) = pruned_graph(i, kf);
          
                          k += 1;
          
                        }
          
                      }
          
                      kf += 1;
          
                      if (k >= output_graph_degree) break;
          
                    }
          
                    if (kr < output_graph_degree) {
          
                      if (rev_graph(i, kr) < graph_size) {
          
                        if (!check_duplicate_edges(&output_graph(i,0), k, rev_graph(i, kr) {
          
                          my_out_graph[k] = my_rev_graph[kr];

Contributor

tfeher Apr 16, 2024

Do I understand correctly, that we leave the MST in output_graph, and append edges from the pruned and reverse graph? Can we add this to the PR description, and possibly as a comment here as well.

cpp/include/raft/neighbors/detail/cagra/graph_core.cuh

+                    auto my_out_graph = output_graph_ptr + (output_graph_degree * i);
+                    uint32_t kf       = 0;
+                    uint32_t kr       = 0;
+                    uint32_t k        = mst_graph_num_edges_ptr[i];

Contributor

tfeher Apr 16, 2024

The code below keeps k edges from the MST, and fills the rest from the forward and reverse graph. Typically how large is k compared to output_graph_degree?

Member

cjnolet commented May 17, 2024

@anaruse now that CAGRA has been migrated to cuVS, this PR will also have to be moved to cuVS when ready (no rush, obviously)

cjnolet added the 5 - DO NOT MERGE label

Member

cjnolet commented May 21, 2024

@tfeher @anaruse this is definiely a welcome change but just a heads up that since we've moved CAGRA over to cuVS, we'll need to migrate these changes over as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment