Abstracting incremental algorithms and supporting numba #2778

jeromekelleher · 2023-07-06T16:43:59Z

jeromekelleher
Jul 6, 2023
Maintainer

Problem

Something that has bothered me for a while is that our incremental algorithms for computing stats and so on have a lot of duplicated code that's quite subtle. We don't use the edge_diffs in C very much because the API is quite clumsy, it involves allocating quite a bit of memory, and it touches memory that we may not need at all for a given calculation. We've ended up duplicating code because it's fairly straighforward, and compiles well (e.g. here, here, here and more)

We also have the problem that none of these algorithms now support left-right bounds, and if we were to (say) add support for parallelism to the general stats API, we'd have to update three separate, slightly different, versions of the incremental tree generation code.

Another problem is that we don't have a natural pattern for doing incremental algorithms using Numba (single tree algorithms work really well).

Proposed solution

We create a new C "class" that captures the state required to iterate over the trees. The crucial difference to the edge_diffs methods is that we only store the range of indexes into ordering arrays for the edges_out and edges_in, not the actual edges.

Here's a rough version done using numba to solve the problem in #2774

WARNING this code is not fully tested and probably has subtle logic problems!!!

import numpy as np
import time
import msprime
import numba
import pandas as pd # just for formatting results


spec = [
    ("num_edges", numba.int64),
    ("sequence_length", numba.float64),
    ("edges_left", numba.float64[:]),
    ("edges_right", numba.float64[:]),
    ("edge_insertion_order", numba.int32[:]),
    ("edge_removal_order", numba.int32[:]),
    ("edge_insertion_index", numba.int64),
    ("edge_removal_index", numba.int64),
    ("interval", numba.float64[:]),
    ("in_range", numba.int64[:]),
    ("out_range", numba.int64[:]),
]
@numba.experimental.jitclass(spec)
class TreePosition:
    def __init__(
        self,
        num_edges,
        sequence_length,
        edges_left,
        edges_right,
        edge_insertion_order,
        edge_removal_order,
    ):
        self.num_edges = num_edges
        self.sequence_length = sequence_length
        self.edges_left = edges_left
        self.edges_right = edges_right
        self.edge_insertion_order = edge_insertion_order
        self.edge_removal_order = edge_removal_order
        self.edge_insertion_index = 0
        self.edge_removal_index = 0
        self.interval = np.zeros(2)
        self.in_range = np.zeros(2, dtype=np.int64)
        self.out_range = np.zeros(2, dtype=np.int64)
    
    def next(self):
        left = self.interval[1]
        j = self.in_range[1]
        k = self.out_range[1]
        self.in_range[0] = j
        self.out_range[0] = k
        M = self.num_edges
        edges_left = self.edges_left
        edges_right = self.edges_right
        out_order = self.edge_removal_order
        in_order = self.edge_insertion_order

        while k < M and edges_right[out_order[k]] == left:
            k += 1
        while j < M and edges_left[in_order[j]] == left:
            j += 1
        self.out_range[1] = k
        self.in_range[1] = j

        right = self.sequence_length
        if j < M:
            right = min(right, edges_left[in_order[j]])
        if k < M:
            right = min(right, edges_right[out_order[k]])
        self.interval[:] = [left, right]
        return j < M or left < self.sequence_length


# Helper function to make it easier to communicate with the numba class
def alloc_tree_position(ts):
    return TreePosition(
        num_edges=ts.num_edges,
        sequence_length=ts.sequence_length,
        edges_left=ts.edges_left,
        edges_right=ts.edges_right,
        edge_insertion_order=ts.indexes_edge_insertion_order,
        edge_removal_order=ts.indexes_edge_removal_order,
    )

@numba.njit
def _coalescent_nodes_numba(tree_pos, num_nodes, edges_parent):
    is_coalescent = np.zeros(num_nodes, dtype=np.int8)
    num_children = np.zeros(num_nodes, dtype=np.int64)
    while tree_pos.next():
        for j in range(tree_pos.out_range[0], tree_pos.out_range[1]):
            e = tree_pos.edge_removal_order[j]
            num_children[edges_parent[e]] -= 1
        for j in range(tree_pos.in_range[0], tree_pos.in_range[1]):
            e = tree_pos.edge_insertion_order[j]
            p = edges_parent[e]
            num_children[p] += 1
            if num_children[p] == 2:
                is_coalescent[p] = True
    return is_coalescent


def coalescent_nodes_numba(ts):
    tree_pos = alloc_tree_position(ts)
    return _coalescent_nodes_numba(tree_pos, ts.num_nodes, ts.edges_parent).astype(bool)


def coalescent_nodes_diffs(ts):
    is_coalescent = np.zeros(ts.num_nodes, dtype=bool)
    num_children = np.zeros(ts.num_nodes, dtype=int)
    for _, edges_out, edges_in in ts.edge_diffs():
        for e in edges_out:
            num_children[e.parent] -= 1
        for e in edges_in:
            num_children[e.parent] += 1
            if num_children[e.parent] == 2:
                # Num_children will always be exactly two once, even arity is greater
                is_coalescent[e.parent] = True
    return is_coalescent
if __name__ == "__main__":
    data = []
    for n in [10, 100, 1000, 10_000, 100_000]:
        ts = msprime.sim_ancestry(
            n,
            sequence_length=10**7,
            recombination_rate=1e-8,
            population_size=10**4,
            random_seed=2,
        )
        # print(ts.draw_text())
        # test_diffs(ts)

        before = time.perf_counter()
        C1 = coalescent_nodes_diffs(ts)
        time_diffs = time.perf_counter() - before

        before = time.perf_counter()
        C2 = coalescent_nodes_numba(ts)
        time_numba = time.perf_counter() - before
        np.testing.assert_array_equal(C1, C2)

        before = time.perf_counter()
        ts.diversity()
        time_div = time.perf_counter() - before
        data.append({
            "n": n, "num_trees": ts.num_trees,
            "time_diffs": time_diffs,
            "time_numba": time_numba,
            "time_div": time_div})
        print(pd.DataFrame(data))

We get the following results:

        n  num_trees  time_diffs  time_numba  time_div
0      10      11234    0.115817    1.316377  0.000671
1     100      21074    0.245136    0.002887  0.001621
2    1000      29524    0.361683    0.004255  0.003054
3   10000      39512    0.567976    0.005947  0.005808
4  100000      48535    1.426510    0.010645  0.015748

It's fast! As we get to larger and larger sample sizes, the time spent in Python using the edge_diffs approach becomes more important. I've included the time required to compute ts.diversity() here as a benchmark to give us a sense of how the numba code competes with C code doing something comparable. We can see that for large sample size the numba code is even faster (this is a much simpler algorithm, after all).

The key pattern that we'll have for client code will be this:

  while tree_pos.next():
        for j in range(tree_pos.out_range[0], tree_pos.out_range[1]):
            e = tree_pos.edge_removal_order[j]
            # Do something with out edge ID e
        for j in range(tree_pos.in_range[0], tree_pos.in_range[1]):
            e = tree_pos.edge_insertion_order[j]
            # Do something with in edge ID e

Note that we could add support for starting at a particular position quite straighforwardly here (and maybe even add support for "direction", so you can go backwards too).

I think this should be as fast as the currently fully-compiled in versions because the call to tree_pos.next() should be inlineable, and it should lead to quite good cache behaviour. The numba results above seem to support this anyway.

What about numba?

You can do efficient incremental algorithms now using numba by copying the code above into your project, but I guess it would be nice to make this generally available to people so we don't duplicate the code. Maybe we need a tskit-numba package or something?

molpopgen · 2023-07-06T18:18:13Z

molpopgen
Jul 6, 2023
Maintainer

This is the way to go. I've recently been stuck several times by "I can do this easily with trees but know that it won't be as fast as edge diffs".

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstracting incremental algorithms and supporting numba #2778

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Abstracting incremental algorithms and supporting numba #2778

jeromekelleher Jul 6, 2023 Maintainer

Problem

Proposed solution

What about numba?

Replies: 1 comment

molpopgen Jul 6, 2023 Maintainer

jeromekelleher
Jul 6, 2023
Maintainer

molpopgen
Jul 6, 2023
Maintainer