-
Is there an efficient way to find which nodes in a tree sequence are coalescent anywhere? My thought would be to (a) group edges by parent ID (b) sort within the group by left coord (c) see if any right coord is less then the previous left coord (if so, this must be a region where at least 2 edges point to the parent node). Will that work? Is there a better way? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Here's a tree-by-tree way to do it which at least uses def node_is_coalescent(ts):
is_coalescent = np.zeros(ts.num_nodes, dtype=bool)
for tree in ts.trees():
is_coalescent[tree.num_children_array[:-1] > 1] = True
return np.where(is_coalescent)[0] |
Beta Was this translation helpful? Give feedback.
-
An edge_diffs based approach like this would be quick (WARNING: untested!) is_coalescent = np.zeros(ts.num_nodes, dtype=bool)
num_children = np.zeros(ts.num_nodes, dtype=int)
for _, edges_out, edges_in in ts.edges_diffs():
for e in edges_out:
num_children[e.parent] -= 1
for e in edges_in:
num_children[e.parent] += 1
if num_children[e.parent] == 2:
# Num_children will always be exactly two once, even arity is greater
is_coalescent[e.parent] = True |
Beta Was this translation helpful? Give feedback.
It's about the same timings for that simulation code changed to 20,000 samples. But with 200,000 samples (still only ~5000 trees) the num_children_array version takes 1.8 secs and the edge_diffs one 1.3 secs.