Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Kd tree, ball tree error #17400

Closed
wants to merge 4 commits into from

Conversation

arka204
Copy link
Contributor

@arka204 arka204 commented May 31, 2020

Reference Issues/PRs

Fixes #14650

What does this implement/fix? Explain your changes.

This PR adds zeros when input to BallTree or KDTree has different number of dimensions.
For example: [(1, 2), (1, 2, 3)] -> [[1, 2, 0], [1, 2, 3]]
Not doing this was resulting in segmentation fault, like explained in issue mentioned above.

Any other comments?

I'd like to add test for KDTree as well, but don't know to which file.
Normally I'd add it to test_kd_tree.py, but it's almost empty (and contains no tests).

@arka204 arka204 changed the title Kd tree, ball tree error [WIP] Kd tree, ball tree error May 31, 2020
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arka204 ! Instead of correcting for it, we should raise ValueError saying that it's invalid input. Normally check_array should do it with dtype='numeric' I think. The appropriate dtype need to be determined -- I haven't checked the code.

Copy link

@KumarGanesha1996 KumarGanesha1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi i feel u bro... i was confused once also. have a good day!

n_samples = data.shape[0]
n_features = data.shape[1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont delete it line...

sklearn/neighbors/_binary_tree.pxi Outdated Show resolved Hide resolved
msg = ("Not all elements had the same number of dimensions"
" - proceeding after extending those with zeros")
with pytest.warns(UserWarning, match=msg):
BallTree(Y)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it need change if u do raise error

@KumarGanesha1996
Copy link

i think you should delete these merge commits - Merge pull request #1 from scikit-learn/master, Merge pull request #2 from scikit-learn/master

@rth
Copy link
Member

rth commented Jun 1, 2020

i think you should delete these merge commits - Merge pull request #1 from scikit-learn/master,

No individual commits in this PR don't matter, we will squash the PR when merging.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of computing the length of each row you can use the standard check_array from scikit-learn:

n_samples = data.shape[0]
n_features = data.shape[1]

self.data_arr = np.asarray(data, dtype=DTYPE, order='C')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should rather just use sklearn.utils.check_array:

        self.data_arr = check_array(data, dtype=DTYPE, order='C')

It should display a ValueError with the message setting an array element with a sequence. which is good enough to me.

@ogrisel
Copy link
Member

ogrisel commented Jul 25, 2020

hi i feel u bro... i was confused once also. have a good day!

@KumarGanesha1996 please try not to implicitly assume the gender of other contributors when addressing them if they decide not to be explicit about it. Let's try to be as inclusive as possible.

@cmarmo
Copy link
Member

cmarmo commented Sep 24, 2020

Hi @arka204 , are you still interested in working on this? Thanks for your work so far.

@cmarmo cmarmo added Superseded PR has been replace by a newer PR and removed Stalled labels Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:neighbors Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KDTree / BallTree seg fault
5 participants