You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am trying to implement a feature selection algorithm (similar to mRMR) based on mutual information and ran into an issue
It seems that the numerical complexity of the MI computation algorithm is $O(n \log(n))$ for every pair of features ($n$ is the number of data points). Considering that my feature selection algorithm requires computing it for every pair of features, this computation is not scaling well with the number of features.
Note that the computation of MI is based on KDTree class for sklearn, which is written in cython. Specifically, the implementation uses KDTree.query_radius() and KDTree.query()
With this regard I have a question:
Can a pure cython implementation drastically (x20-x30) improve the performance of the current sklearn MI estimation function ? Note that I would need to run a loop calling this function with $O(m^2)$ calls. Here $m$ is the number of features and can be very large $O(10^3)$.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello, I am trying to implement a feature selection algorithm (similar to mRMR) based on mutual information and ran into an issue
It seems that the numerical complexity of the MI computation algorithm is$O(n \log(n))$ for every pair of features ($n$ is the number of data points). Considering that my feature selection algorithm requires computing it for every pair of features, this computation is not scaling well with the number of features.
Note that the computation of MI is based on
KDTree
class for sklearn, which is written in cython. Specifically, the implementation usesKDTree.query_radius()
andKDTree.query()
With this regard I have a question:
Can a pure cython implementation drastically (x20-x30) improve the performance of the current sklearn MI estimation function ? Note that I would need to run a loop calling this function with$O(m^2)$ calls. Here $m$ is the number of features and can be very large $O(10^3)$ .
Beta Was this translation helpful? Give feedback.
All reactions