Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance testing find_objects #252

Open
GenevieveBuckley opened this issue Dec 17, 2021 · 1 comment
Open

Performance testing find_objects #252

GenevieveBuckley opened this issue Dec 17, 2021 · 1 comment

Comments

@GenevieveBuckley
Copy link
Collaborator

The find_objects functionality is still quite new, and it would be good to get some performance testing done.

Some previous discussion is here #240 (comment)

Second, I think it's better to avoid using the scipy.ndimage.find_objects function directly. If you have an image chunk with just one object with a really high integer label n, the scipy find_objects result will return n - 1 values of None, and then the single meaningful result. That seems bad for parallized applications, so I think looping through only the unique integer values present in a given image chunk is a better way to go.

I've seen that scipy's find_objects uses a C implementation for speed and of course it'd be nice to avoid parallel implementations. How about calling the scipy function on a relabelled array to circumvent the problem you mention?

relabel_ar = np.zeros(len(unique_vals) + 1)
relabel_dict = dict() # dict for inverting relabelling afterwards
for il, l in enumerate(unique_dict):
    relabel_dict[il] = l
    relabel_ar[l] = il
x_relabelled = relabel_ar[x]

See also https://scikit-image.org/docs/dev/api/skimage.segmentation.html#relabel-sequential

And the reply:

We could potentially do that, as long as we kept track of the mapping between the old and new label integers.

Whether it's faster & worth it would depend on results from some performance testing. I'm inclined to get an implementation in, and then tinker with speed improvements (and anyone who'd like to jump in and try stuff is more than welcome!)

@GenevieveBuckley
Copy link
Collaborator Author

I've also said it would be good to trial this on a dataset of a big-ish size. Most development was done with small, toy datasets.

#240 (comment)

I'm hoping dask/dask#7851 isn't going to be a problem here (might not be, but it's a good idea to try this on something of a decent size)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant