Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to take back ownership of an ArrayRef #2901

Closed
alamb opened this issue Oct 20, 2022 · 9 comments · Fixed by #3117
Closed

API to take back ownership of an ArrayRef #2901

alamb opened this issue Oct 20, 2022 · 9 comments · Fixed by #3117
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Oct 20, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

@crepererum and I are using BooleanArray to work as a bitset but the api is slightly cumbersome as the compute kernels give back an ArrayRef so we have to do some clunky downcasting

Describe the solution you'd like
I would like some way to convert an ArrayRef to BooleanArray

I can go from ArrayRef to &BooleanArray with functions like https://docs.rs/arrow/25.0.0/arrow/array/fn.as_boolean_array.html but I can't get the owned version

Here is a specific example / reproducer:

use std::sync::Arc;

use arrow::{array::{BooleanArray, ArrayRef}, compute::concat};



fn main() {
    let b1: BooleanArray = [Some(true), Some(true), Some(false), Some(false), None].into_iter().collect();
    let b2: BooleanArray = [Some(true), Some(false), Some(true), Some(false), None].into_iter().collect();

    let output: ArrayRef = concat(&[&b1, &b2]).unwrap();

    // I want to get output as a BooleanArray (which it is) but I can't figure out how to downcast it

    // this almost works but gets an error:

//     --> src/main.rs:16:48
//     |
// 16  |     let output: BooleanArray = Arc::try_unwrap(output)
//     |                                --------------- ^^^^^^ doesn't have a size known at compile-time
//     |                                |
//     |                                required by a bound introduced by this call
//     |


    let output: BooleanArray = Arc::try_unwrap(output)
        .unwrap() // take ownership
        .as_any()
        .downcast::<BooleanArray>()
        .unwrap(); // verify it was actually BooleanArray
}

It doesn't compile

Describe alternatives you've considered
None yet

Additional context
You can see the context of what we are doing on https://github.com/influxdata/influxdb_iox/pull/5910 / https://github.com/influxdata/influxdb_iox/pull/5910#discussion_r1000479355

@alamb alamb added arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog labels Oct 20, 2022
@crepererum
Copy link
Contributor

I think Array needs a new method for this to work. In addition to fn as_any(&self) -> &dyn Any it should also provide fn as_any_arc(self: &Arc<Self>) -> &Arc<dyn Any + Send + Sync + 'static> (IIRC this is allowed by current Rust since Arc is special). From there you can downcast the Arc and (if the user wants to) unwrap it (however for most cases it's OK to keep the concrete array type within the Arc).

crepererum added a commit to crepererum/arrow-rs that referenced this issue Oct 20, 2022
@alamb
Copy link
Contributor Author

alamb commented Oct 20, 2022

Or we could maybe add a Sized constraint to the Array trait ? https://docs.rs/arrow/latest/arrow/array/trait.Array.html

@tustvold
Copy link
Contributor

tustvold commented Oct 20, 2022

Or we could maybe add a Sized constraint to the Array trait ? https://docs.rs/arrow/latest/arrow/array/trait.Array.html

It is complaining that &dyn Array isn't Sized, not Array.

I think Array needs a new method for this to work

I actually wrote up how to support this on #1981

In particular https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=abb7fcf97e33ba83d44b17f6c89e8f0b

I'll get this into a PR

Edit: Lifetimes cause issues 😢

@tustvold tustvold self-assigned this Oct 20, 2022
@alamb
Copy link
Contributor Author

alamb commented Oct 20, 2022

Note that @crepererum wrote #2902

@tustvold
Copy link
Contributor

Yeah, I was hoping to avoid leaking this into the public interface as that PR does... Perhaps it is unavoidable 😅

@crepererum
Copy link
Contributor

Would it be an option to remove impl Array for T for the two cases where T involves references? Looking at all the impls and how Array is usually used, these two look pretty special.

@tustvold
Copy link
Contributor

Would it be an option to remove impl Array for T for the two cases where T involves references

Yes, unfortunately ArrayAccessor relies on references as currently ArrayAccessor: Array. The main reason for this is because it avoided ambiguous method calls, as both need to provide fn data(&self). Not a good reason, but that is the way it currently is. I would support changing this though

@alamb
Copy link
Contributor Author

alamb commented Oct 27, 2022

I found a workaround in the DataFusion codebase -- namely create it from ArrayData:

BooleanArray::from(mask.data().clone())

https://github.com/apache/arrow-datafusion/blob/6e0097d35391fea0d57c1d2ecfdef18437f681f4/datafusion/core/src/physical_plan/file_format/row_filter.rs#L106-L114

@tustvold
Copy link
Contributor

Yeah, the downside is it isn't quite as cheap. But maybe that is fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants