-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize multiannotator.py for performance #1077
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1077 +/- ##
==========================================
+ Coverage 96.15% 96.21% +0.05%
==========================================
Files 74 74
Lines 5850 5861 +11
Branches 1044 1044
==========================================
+ Hits 5625 5639 +14
+ Misses 134 132 -2
+ Partials 91 90 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the amazing contribution and speedup on the multiannotator module!
I made some comments, mostly about code readability (many of these methods are doing many non-trivial operations, so it would be super helpful to write some comments documenting the steps).
Besides that, the only real issue I found is in the assert_valid_inputs_multiannotator
method, made a comment there with more details.
Please don't hesitate to ask for more clarification if anything isn't clear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a small edit to make sure that the mask variable names are clear, but otherwise the changes look good to me!
Awesome work on the speedup @gogetron!
Co-authored-by: Hui Wen <45724323+huiwengoh@users.noreply.github.com>
Summary
This PR partially addresses #862
[ ✏️ Write your summary here. ]
After profiling this function, the parts that took most time were a few loops that were reimplemented using numpy operations and the
apply_along_axis
methods which were also reimplemented with faster numpy operations.I noticed there was a small issue with the warning in
assert_valid_inputs_multiannotator
. The warning was only raised when the labels were sorted. This can be reproduced with the code in the added test:The function will raise the warning correctly now and a new test was added to assert this behavior.
For memory I used the memory-profiler library. The code I used for benchmarking is copied below. In addition I sorted the imports in the modified files.
Code Setup
Current version
This PR
Testing
References
Reviewer Notes