Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Kd tree, ball tree error #17400

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 17 additions & 1 deletion sklearn/neighbors/_binary_tree.pxi
Expand Up @@ -1053,9 +1053,25 @@ cdef class BinaryTree:
if leaf_size < 1:
raise ValueError("leaf_size must be greater than or equal to 1")

longest_data = max(len(item) for item in data)
padded_data = []
padding = False

for item in data:
if len(item) < longest_data:
item = np.asarray(item)
padded_item = np.zeros(longest_data)
padded_item[:item.shape[0]] = item
padded_data.append(padded_item)
padding = True
else:
padded_data.append(np.asarray(item))
if padding:
warnings.warn("Not all elements had the same number of dimensions"
" - proceeding after extending those with zeros")
data = np.asarray(padded_data)
ogrisel marked this conversation as resolved.
Show resolved Hide resolved
n_samples = data.shape[0]
n_features = data.shape[1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont delete it line...

self.data_arr = np.asarray(data, dtype=DTYPE, order='C')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should rather just use sklearn.utils.check_array:

        self.data_arr = check_array(data, dtype=DTYPE, order='C')

It should display a ValueError with the message setting an array element with a sequence. which is good enough to me.

self.leaf_size = leaf_size
self.dist_metric = DistanceMetric.get_metric(metric, **kwargs)
Expand Down
9 changes: 9 additions & 0 deletions sklearn/neighbors/tests/test_ball_tree.py
Expand Up @@ -65,3 +65,12 @@ def test_query_haversine():

assert_array_almost_equal(dist1, dist2)
assert_array_almost_equal(ind1, ind2)


def test_different_dimension_size():
X = [(1, 2, 3), (2, 5), (5, 5, 1, 2)]
Y = np.array(X)
msg = ("Not all elements had the same number of dimensions"
" - proceeding after extending those with zeros")
with pytest.warns(UserWarning, match=msg):
BallTree(Y)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it need change if u do raise error