New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Kd tree, ball tree error #17400
Conversation
Merging changes from the main repository
Merging changes from the main repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arka204 ! Instead of correcting for it, we should raise ValueError
saying that it's invalid input. Normally check_array
should do it with dtype='numeric'
I think. The appropriate dtype need to be determined -- I haven't checked the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi i feel u bro... i was confused once also. have a good day!
n_samples = data.shape[0] | ||
n_features = data.shape[1] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont delete it line...
msg = ("Not all elements had the same number of dimensions" | ||
" - proceeding after extending those with zeros") | ||
with pytest.warns(UserWarning, match=msg): | ||
BallTree(Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it need change if u do raise error
No individual commits in this PR don't matter, we will squash the PR when merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of computing the length of each row you can use the standard check_array
from scikit-learn:
n_samples = data.shape[0] | ||
n_features = data.shape[1] | ||
|
||
self.data_arr = np.asarray(data, dtype=DTYPE, order='C') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should rather just use sklearn.utils.check_array
:
self.data_arr = check_array(data, dtype=DTYPE, order='C')
It should display a ValueError
with the message setting an array element with a sequence.
which is good enough to me.
@KumarGanesha1996 please try not to implicitly assume the gender of other contributors when addressing them if they decide not to be explicit about it. Let's try to be as inclusive as possible. |
Hi @arka204 , are you still interested in working on this? Thanks for your work so far. |
Reference Issues/PRs
Fixes #14650
What does this implement/fix? Explain your changes.
This PR adds zeros when input to
BallTree
orKDTree
has different number of dimensions.For example: [(1, 2), (1, 2, 3)] -> [[1, 2, 0], [1, 2, 3]]
Not doing this was resulting in segmentation fault, like explained in issue mentioned above.
Any other comments?
I'd like to add test for KDTree as well, but don't know to which file.
Normally I'd add it to
test_kd_tree.py
, but it's almost empty (and contains no tests).