Mutual information estimation #28699

aivarsoo · 2024-03-26T10:59:08Z

aivarsoo
Mar 26, 2024

Hello, I am trying to implement a feature selection algorithm (similar to mRMR) based on mutual information and ran into an issue

It seems that the numerical complexity of the MI computation algorithm is $O(n \log(n))$ for every pair of features ($n$ is the number of data points). Considering that my feature selection algorithm requires computing it for every pair of features, this computation is not scaling well with the number of features.

Note that the computation of MI is based on KDTree class for sklearn, which is written in cython. Specifically, the implementation uses KDTree.query_radius() and KDTree.query()

With this regard I have a question:

Can a pure cython implementation drastically (x20-x30) improve the performance of the current sklearn MI estimation function ? Note that I would need to run a loop calling this function with $O(m^2)$ calls. Here $m$ is the number of features and can be very large $O(10^3)$.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutual information estimation #28699

{{title}}

Replies: 0 comments

Select a reply

Mutual information estimation #28699

aivarsoo Mar 26, 2024

Replies: 0 comments

aivarsoo
Mar 26, 2024