More repeatability questions for cugraph.leiden #4373

ChuckHastings · 2024-04-24T17:12:12Z

Follow up from issue #4072.

          @ChuckHastings, thank you for working on this! I've just tested with version `24.06.00a43`. The issue is resolved completely when the adjacency matrix is `float64`, but there is still some variability, although much less, when it's `float32` and the graph is large (75,000 nodes). I'm attaching the zipped [npz file](https://github.com/rapidsai/cugraph/files/15093902/adjacency_75000.zip) of the adjacency matrix. Here are the results I'm getting from 10 runs, as before:

Min. modularity (float32): 0.8989160060882568
Mean modularity (float32): 0.8994227409362793
Max. modularity (float32): 0.9014488458633423

Min. partition count (float32): 67
Max. partition count (float32): 68

[[1.0000000000 1.0000000000 0.8200713715 1.0000000000 1.0000000000 1.0000000000 0.8200713715 1.0000000000 1.0000000000]
 [         nan 1.0000000000 0.8200713715 1.0000000000 1.0000000000 1.0000000000 0.8200713715 1.0000000000 1.0000000000]
 [         nan          nan 0.8200713715 1.0000000000 1.0000000000 1.0000000000 0.8200713715 1.0000000000 1.0000000000]
 [         nan          nan          nan 0.8200713715 0.8200713715 0.8200713715 1.0000000000 0.8200713715 0.8200713715]
 [         nan          nan          nan          nan 1.0000000000 1.0000000000 0.8200713715 1.0000000000 1.0000000000]
 [         nan          nan          nan          nan          nan 1.0000000000 0.8200713715 1.0000000000 1.0000000000]
 [         nan          nan          nan          nan          nan          nan 0.8200713715 1.0000000000 1.0000000000]
 [         nan          nan          nan          nan          nan          nan          nan 0.8200713715 0.8200713715]
 [         nan          nan          nan          nan          nan          nan          nan          nan 1.0000000000]]

Min. adjusted Rand index (float32): 0.8200713715385887
Mean adjusted Rand index (float32): 0.9360253765470538
Max. adjusted Rand index (float32): 1.0

Since this is probably a different problem than the dendrogram flattening that you fixed, should I open a new issue? Or can you reopen this one?

Originally posted by @jpintar in #4072 (comment)

The text was updated successfully, but these errors were encountered:

ChuckHastings · 2024-04-24T17:25:38Z

Quick first observation of this. I have run this several times. For 64-bit I am always seeing everything align (as you indicate in your post). For 32-bit I am seeing some variability. Sometimes I get matching answers, sometimes I get differences. So this behavior (in 32-bit) is still non-deterministic.

I wonder if we are seeing floating point rounding errors causing different choices in the clustering decisions. I will attempt to isolate why we are seeing differences.

ChuckHastings · 2024-05-03T16:05:22Z

I have validated that there are issues related to numerical instability that are causing repeatability issues for 32-bit weights.

I would note that if your graph is large enough you'll probably see these in 64-bit weights as well, although your graph would need to be pretty large, I would think.

Our implementation of Leiden/Louvain executes computations in parallel, there is no guarantee that the order of additions of floating point numbers will be consistent from call to call. If we have a cluster weight on the order of 1e-5, and we have an edge weight on the order of 1e-11, the order in which the additions occur will have subtle changes to the result. As an example that I saw in this data:

We computed the delta modularity for vertex v to move into cluster c1 as 5.08771e-5, and the delta modularity to move into cluster c2 was 5.08771e-5 (with c1 < c2). A tie (the difference between them was exactly 0). Our tie breaking procedure is to pick the smaller cluster id, so we put the vertex v into cluster c1.
In a separate the computation of delta modularity for c2 we still get 5.08771e-5, but for c1 we get 5.08770e-5. The additions must have included one of the larger values before some of the much smaller values, and because we were dealing with a 32-bit float we lost the addition of the smaller values. In this case, delta modularity for c2 is larger than c1, so we decide to move the vertex v into cluster c2.

This results in a change in the entire sequence of decisions for the greedy Louvain and Leiden, so we end up with a different (sometimes radically different) clustering choice... but pretty good modularity score.

We can look into some changes to address this. But this would definitely be a longer term research problem.

I would suggest that if repeatability is a priority for you, using the 64-bit double should dramatically reduce the change of instability in the computation.

jpintar · 2024-05-03T23:56:43Z

Thank you for looking into this!

ChuckHastings self-assigned this Apr 29, 2024

ChuckHastings closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More repeatability questions for cugraph.leiden #4373

More repeatability questions for cugraph.leiden #4373

ChuckHastings commented Apr 24, 2024

ChuckHastings commented Apr 24, 2024

ChuckHastings commented May 3, 2024

jpintar commented May 3, 2024

More repeatability questions for cugraph.leiden #4373

More repeatability questions for cugraph.leiden #4373

Comments

ChuckHastings commented Apr 24, 2024

ChuckHastings commented Apr 24, 2024

ChuckHastings commented May 3, 2024

jpintar commented May 3, 2024