-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slice by dask array of ints #3407
Conversation
docs/source/array-slicing.rst
Outdated
* Slicing one ``dask.array`` with another ``x[x > 0]`` | ||
* Slicing one `~dask.array.Array` with a `~dask.array.Array` of bools ``x[x > 0]`` | ||
* Slicing one `~dask.array.Array` with a zero or one-dimensional `~dask.array.Array` | ||
of ints ``x[x.map_blocks(np.argsort)]`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a problematic example as argsort won't behave as expected if x is chunked on the axis it operates. Should be replaced with argtopk as soon as it is merged (#3405).
dask/array/tests/test_slicing.py
Outdated
# Unsupported: 0d numpy array slicers (#3406) | ||
# np.testing.assert_array_equal(x[idx0, idx0], d[idx0, idx0_d]) | ||
# np.testing.assert_array_equal(x[idx0, idx], d[idx0, idx_d]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment after fixing #3406
dask/array/slicing.py
Outdated
|
||
if np.isnan(x.chunks[axis]).any(): | ||
raise NotImplementedError("Slicing an array with unknown chunks with a " | ||
"dask.array of ints is not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is feasible but would horribly complicate everything because the graph will have to figure out dynamically the offset and the chunks (see below). I could not think of any strong reason to break my head on it as I think it's a very marginal use case.
dask/array/slicing.py
Outdated
token = tokenize(x, idx, axis) | ||
name1 = 'slice_with_int_dask_array_chunk-' + token | ||
name2 = 'slice_with_int_dask_array_aggregate-' + token | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions are welcome to use some kind of wrapper function instead of manually building the graph.
I tried with atop but didn't have much success.
This looks pretty similar to PR ( #3210 ). Might be worth taking a look. |
Is it possible to do something like, |
@jakirkham yes.
[EDIT] to be more precise: the index can have NaN chunks, but the sliced array can't. |
Cool, thanks for checking. Could you please add a test along those lines? |
@jakirkham I added it before you even had time to ask :) |
@jakirkham any update on this? |
Sorry, haven't had time to give it another look. Am pretty busy for the next month. Though this isn't necessarily a bad thing. Would be a great time to get some other thoughts on this. Expect @shoyer's got some good ideas. |
Found a bug - I'll revert once it's ironed out |
@jakirkham fixed bug and enhanced tests - ready for merge... |
Sounds good. If you have any thoughts on |
@djhoese fixed uint index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only thing major thing to add here is handling negative indices, or at least being sure that we give a graceful error message?
docs/source/changelog.rst
Outdated
@@ -7,7 +7,7 @@ Changelog | |||
Array | |||
+++++ | |||
|
|||
- Allow slicing a Dask Array by another one-dimensional Dask Array of integers | |||
- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably add this back? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed to reduce merge collisions, in accordance to the new policy of the dask project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks good to me. I'll merge this in day or two unless anyone else has comments.
Does this work with |
@jakirkham no test, and I didn't check. I'll look into it ASAP. |
@jakirkham vindex doesn't work, in the sense that the index is silently computed at definition time. I wrote initial unit tests for it; feel free to grab them: crusaderky@9cf947a |
So I think you can steal the code out of PR ( #3210 ) to accomplish this. It already has the correct dispatch logic in |
@jakirkham @shoyer to me the functionality in vindex is a separate, albeit related, feature. Can we please merge this one and open a new PR if there aren't specific critiques?!? |
Agreed, vindex is something separate. Thanks! |
Thanks for your patience here @crusaderky !
Thanks for taking the time to review @shoyer !
…On Thu, Jul 5, 2018 at 1:02 PM, Stephan Hoyer ***@***.***> wrote:
Agreed, vindex is something separate.
Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3407 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszG4uv3cavNY-pWOhE9GYZxrMUqOfks5uDkakgaJpZM4TVgaw>
.
|
Thanks everybody for your help! |
Closes #3396 (together with #3405)
Strengths:
Limitations: