Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayBuilder struct for safe/efficient dynamic array initialisation #3131

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

conradludgate
Copy link

@programmerjake
Copy link
Member

I think we should have a full-fledged ArrayVec instead, though having this as a stop-gap measure that is later deprecated could be a good idea.

@conradludgate
Copy link
Author

I think we should have a full-fledged ArrayVec instead, though having this as a stop-gap measure that is later deprecated could be a good idea.

There's already an RFC for that. Mentioned in the Prior Art. I would rather get something smaller scope in quickly so that it can start to be used

@burdges
Copy link

burdges commented May 26, 2021

There are four ArrayVec-like crates that provide this functionality now, so zero reason for "quickly" doing "stop-gaps". If anything, we need a fifth crate that unifies their various features, ala #2990 (comment) and can thus join std with minimal resistance.

@conradludgate
Copy link
Author

I was speaking with @BurntSushi earlier today, he wasn't so sure if it made sense to have such functionality in std just for the sake of it. But array initialisation is one of those basic things that does seem weird to not have.

While they would be similar implementations. I do think the semantics are different.

Thinking of their location too.
ArrayVec makes sense to live in a potential core::collections module
ArrayBuilder makes more sense directly in core::array
IMO

@burdges
Copy link

burdges commented May 26, 2021

Yes, it's possible they do not unify cleanly, which maybe his concern. I donno..

@BurntSushi
Copy link
Member

To clarify my current thoughts:

  • Safe array initialization for non-Copy non-const types seems like a primitive we should have, at minimum.
  • The need for ArrayVec in std is less clear. I am unsure of how widely used it is. Moreover, given the number of crates that provide ArrayVec implementations, there is a suggestion that the design space for it is large, and that it would perhaps be better for that process to be carried out in the ecosystem. I note that I do not have any special insight into that design space, I am merely making an observation and drawing a tentative conclusion.

@conradludgate
Copy link
Author

Safe array initialization for non-Copy non-const types seems like a primitive we should have, at minimum.

Thanks for putting this so succinctly. This is exactly what separates this RFC from the other. Having the minimum needed to enrich core with what should be core functionality.

I personally don't mind if a full ArrayVec makes it's way into std, I have no use for it. But that shouldn't prevent the minimum feature set described above being incorporated first.

@kornelski
Copy link
Contributor

kornelski commented May 27, 2021

The ArrayBuilder is pretty much the ArrayVec type: build() could have been try_into(). Giving it a more generic name (not builder) would make the type more useful. It derefs to a slice, so this type is already quite usable as a stack/inline-allocated Vec.

I disagree with @BurntSushi's assessment of ArrayVec. With 20 million downloads, arrayvec is one of the most popular Rust crates. The API is rather simple: it's a Vec. The existing ArrayVec API is fine.

There are multiple crates with seemingly overlapping functionality, because:

  1. there's a bunch of forks/alternatives that were created to add const fn and const generics support,
  2. there are variants that can seamlessly grow to be heap allocated, or are basically a small vector optimization.

The case 1 is not a problem any more. The case 2 can easily be declared out of scope. There are pros and cons of using the heap for extra growth, but for std — or rather core — it makes sense to focus in the array part more, and have a Copy-compatible nostd-compatible simple type.

ArrayVec solves the initialization problem pretty well. It's more flexible than a callback-based initialization method. Amount of boilerplate code is comparable to MaybeUninit, but safe. When used with a simple loop, it should be a zero-cost abstraction.

I'm not insisting that Rust needs have this type in std/core. Rust already requires 3rd party crates for a lot of basic functionality, and arrayvec works as a crate just fine. But if Rust wanted to have a built-in solution to initializing arrays, adopting arrayvec is a good way to do it. It happens to solve the initialization problem, and is generally useful as an on-stack-Vec too.

@BurntSushi
Copy link
Member

BurntSushi commented May 27, 2021

I disagree with @BurntSushi's assessment of ArrayVec. With 20 million downloads, arrayvec is one of the most popular Rust crates.

Just to be clear, I said I was unsure of how widely used it is, not that it isn't widely used. And download count is somewhat deceptive. For example, about 1/3 of arrayvec's downloads come from csv-core, where arrayvec is a dev-dependency. Its total dependents is about 300, which to me says that yeah it's fairly popular. I agree with that. The question is always whether that's enough to to push it over the top and into std.

I'm not insisting that Rust needs have this type in std/core. Rust already requires 3rd party crates for a lot of basic functionality, and arrayvec works as a crate just fine. But if Rust wanted to have a built-in solution to initializing arrays, adopting arrayvec is a good way to do it. It happens to solve the initialization problem, and is generally useful as an on-stack-Vec too.

OK, so I finally had a chance to actually look at this RFC and the API is a lot bigger than what I thought it would be.

When I said "Safe array initialization for non-Copy non-const types seems like a primitive we should have, at minimum." above, what I had in mind was something small and primitive like this: rust-lang/rust#75644

This RFC is a lot more than a simple primitive for building arrays.

Ideally, the primitive would/could be used by the various ArrayVec crates to reduce some unsafe code. So it might be good to get feedback from them on the from_fn API in that PR.

@conradludgate
Copy link
Author

conradludgate commented May 28, 2021

OK, so I finally had a chance to actually look at this RFC and the API is a lot bigger than what I thought it would be.

Ultimately, there's only 3 functions I care about here.

impl<T, const N: usize> ArrayBuilder<T, N> {
    pub fn new() -> Self;
    pub unsafe fn push_unchecked(&mut self, t: T); // UB if array is full
    pub unsafe fn build_unchecked(self) -> [T; N]; // UB if not full
}

With these primitive functions, you can build from_fn quite simply

fn array_from_fn<T, const N: usize>(f: impl FnMut(usize) -> T) -> [T; N] {
    let mut array = ArrayBuilder::<T, N>::new();
    for i in 0..N {
        unsafe { array.push_unchecked(f(i)); } // safe because array won't be full
    }
    unsafe { array.build_unchecked() } // safe because array will be full
}

Similarly, a FromIterator impl is trivial

impl<T, const N: usize> FromIterator<T> for [T; N] {
    fn from_iter<I>(iter: I) -> Self where I: IntoIterator<Item=T> {
        let mut iter = iter.into_iter();
        for _ in 0..N {
            unsafe { array.push_unchecked(iter.next().unwrap()); } // safe because array won't be full
        }
        unsafe { array.build_unchecked() } // safe because array will be full
    }
}

The rest of the functions presented in the RFC are just natural extensions. I'm impartial to the pop functions. I haven't had any use for them, so they can be removed if considered out of scope.

What this ultimately has over just having from_fn built from scratch, is that from_fn is all or nothing. With ArrayBuilder, I can stop half way through, or I can hit an error case, handle it and then extract the initialised values from the Deref slice and clone them into a vec if I were to want them

@conradludgate
Copy link
Author

I have a working implementation

@clarfonthey
Copy link
Contributor

clarfonthey commented May 29, 2021

I would like to reiterate support for ArrayVec in core over something like this. I wouldn't be opposed to having something like array_from_fn mentioned earlier in addition to an ArrayVec equivalent, but I would be opposed to some ArrayBuilder struct whose functionality wouldn't be extendable to ArrayVec in the future. To be clear, creating some form of ArrayVec while committing only to a very small subset of methods initially would be acceptable imho.

My main justification for ArrayVec is that, IMHO, the use case of "I want between 0 and N elements" is pretty standard, especially with small N. Even something as simple of a 0-2 item array seems extremely reasonable to have on the stack, whereas Vec would add an entire other layer of indirection.

Plus, the ability to have an equivalent to Vec that works in embedded and other no_std environments is a huge plus. I personally use ArrayVec in crates where I know that the upper bound size is small that otherwise would require alloc for such use cases.

Another big thing to point out is that we technically have a cursed equivalent to ArrayVecin libcore in the form of slice::IntoIter, where the use case of a list of 0 to N elements can be covered by a bit of messing around with IntoIter::new.

Plus, there are already a couple things in the compiler itself that should be using some equivalent of ArrayVec but instead use jank enums, like CaseMappingIter


Additional note rereading this: a lot of the time the case where I want "0 to N elements" is in return values, and without unsized rvalues, something like ArrayVec is necessary. IMHO, even with unsized rvalues, I feel like ArrayVec would be better in those cases.

@Luro02
Copy link

Luro02 commented Aug 11, 2021

If I understand this RFC correctly, the main motivation is to be able to safely initialize an array, which can already be done with the soon-to-be stabilized [T;N]::map function:

The examples from the RFC written with map:

use core::array;

let mut values = array::IntoIter::new(["a", "b", "c", "d"]);

let arr: [String; 4] = [(); 4].map(|_| {
    values.next().unwrap().to_string()
});
let mut values = 1..=10;

let arr: [usize; 10] = [(); 10].map(|_| {
    let i = values.next().unwrap();
    i * i
});

The iterator example is relatively difficult to replicate and requires the [T; N]::try_map, which is not yet implemented (seems to be blocked on a design decision regarding the try_trait). Therefore, an extension trait is used to accomplish this:

#![feature(
    try_trait_v2,
    maybe_uninit_array_assume_init,
    maybe_uninit_uninit_array
)]
use core::iter;
use core::mem::MaybeUninit;
use core::ops::Try;

pub trait ArrayExt<T, const N: usize> {
    fn try_map<F, U, R, X>(self, f: F) -> R
    where
        X: Try<Output = U>,
        R: Try<Output = [U; N], Residual = X::Residual>,
        F: FnMut(T) -> X;
}

impl<T, const N: usize> ArrayExt<T, N> for [T; N] {
    fn try_map<F, U, R, X>(self, mut f: F) -> R
    where
        X: Try<Output = U>,
        R: Try<Output = [U; N], Residual = X::Residual>,
        F: FnMut(T) -> X,
    {
        let mut array: [MaybeUninit<U>; N] = MaybeUninit::uninit_array();
        let mut iterator = core::array::IntoIter::new(self);

        for item in array.iter_mut() {
            // NOTE: it is guranteed that this will not panic
            let next = iterator.next().unwrap();
            *item = MaybeUninit::new(f(next)?);
        }

        // SAFETY: because of the previous loops all values are guranteed to be
        //         initialized
        let result: [U; N] = unsafe { MaybeUninit::array_assume_init(array) };

        R::from_output(result)
    }
}

// this should do almost the same thing as in the example, except for the "remaining" function
fn chunked_iterator<I, const N: usize>(mut iter: I) -> impl Iterator<Item = [I::Item; N]>
where
    I: Iterator,
{
    iter::from_fn(move || {
        let values: Option<[_; N]> = [(); N].try_map(|_| iter.next());
        values
    })
}

I also want to point out that people are currently planning an abstraction called Storage, with which it may be possible in the future to use the standard collections (like Vec) without an allocator or code duplication:

rust-lang/wg-allocators#79

@conradludgate
Copy link
Author

The only thing that the array map misses that this RFC aims to provide (as you pointed out) is partial array state. Being able to take a slice to the initialised values if it's not a complete array. Similarly, I imagined the proposed type to be how you would implement such a try_map.

impl<T, const N: usize> ArrayExt<T, N> for [T; N] {
    fn try_map<F, U, R, X>(self, mut f: F) -> R
    where
        X: Try<Output = U>,
        R: Try<Output = [U; N], Residual = X::Residual>,
        F: FnMut(T) -> X,
    {
        let mut array: ArrayBuilder<U, N> = ArrayBuilder::new();
        for item in core::array::IntoIter::new(self) {
            // SAFETY: N elements have not yet been pushed
            unsafe { array.push_unchecked(f(item)?); }
        }

        // SAFETY: previous loop guarantees N elements have been set
        let result: [U; N] = unsafe { array.build_unchecked() };

        R::from_output(result)
    }
}

@conradludgate
Copy link
Author

I also want to point out that people are currently planning an abstraction called Storage, with which it may be possible in the future to use the standard collections (like Vec) without an allocator or code duplication:

This is really interesting. I'm a big fan of the proposed storage api. If that gets decided on this RFC is indeed irrelevant as a the ArrayBuilder<T, N> would be superceded by Vec<T, [MaybeUninit<T>; N]> (given there's a way to extract the raw storage out of the vec)

@ehuss ehuss added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 5, 2021
@tgross35
Copy link
Contributor

Just linking here. - #2990 was closed, but I just opened #3316 which proposes to add an ArrayVec (and so far seems mildly positively reviewed), so perhaps it would be good to condense effort there.

However - as mentioned on that thread, the new storages API would likely supplant any need for a separately-implemented ArrayVec, which is a good thing.

@elichai
Copy link

elichai commented Mar 11, 2023

I really wanted to use this API, but the codegen is not great, I compared the following 3 implementations:

use core::mem::MaybeUninit;
use array_builder::ArrayBuilder;

pub fn write_array0(f: fn() -> u8) -> [u8; 1024] {
    core::array::from_fn(|_|f())
}

pub fn write_array1(f: fn() -> u8) -> [u8; 1024] {
    let mut uninit = [MaybeUninit::uninit(); 1024];
    for i in 0..1024 {
        uninit[i] = MaybeUninit::new(f());
    }
    unsafe { (uninit.as_ptr() as *const [u8; 1024]).read() }
}

pub fn write_array2(f: fn() -> u8) -> [u8; 1024] {
    let mut builder = ArrayBuilder::new();
    for _ in 0..1024 {
        builder.push(f());
    }
    builder.build().unwrap()
}

And the produced ASM is as follows:

write_array0
 push    r15
 push    r14
 push    rbx
 sub     rsp, 2064
 mov     r15, rsi
 mov     r14, rdi
 xor     ebx, ebx
.LBB9_1:
 call    r15
 mov     byte, ptr, [rsp, +, rbx, +, 9], al
 inc     rbx
 cmp     rbx, 1024
 jne     .LBB9_1
 lea     r15, [rsp, +, 1033]
 lea     rsi, [rsp, +, 9]
 mov     rbx, qword, ptr, [rip, +, memcpy@GOTPCREL]
 mov     edx, 1024
 mov     rdi, r15
 call    rbx
 mov     edx, 1024
 mov     rdi, r14
 mov     rsi, r15
 call    rbx
 mov     rax, r14
 add     rsp, 2064
 pop     rbx
 pop     r14
 pop     r15
 ret
.LBB9_4:
 mov     r14, rax
 lea     rsi, [rsp, +, 9]
 mov     rdi, rbx
 call    core::ptr::drop_in_place<core::array::Guard<u8,1024_usize>>
 mov     rdi, r14
 call    _Unwind_Resume
 ud2
write_array1
push    r15
push    r14
push    r12
push    rbx
sub     rsp, 1032
mov     r15, rsi
mov     r14, rdi
xor     ebx, ebx
.LBB10_1:
lea     r12, [rbx, +, 1]
call    r15
mov     byte, ptr, [rsp, +, rbx, +, 8], al
mov     rbx, r12
cmp     r12, 1024
jne     .LBB10_1
lea     rsi, [rsp, +, 8]
mov     edx, 1024
mov     rdi, r14
call    qword, ptr, [rip, +, memcpy@GOTPCREL]
mov     rax, r14
add     rsp, 1032
pop     rbx
pop     r12
pop     r14
pop     r15
ret
write_array2
 push    rbp
 push    r15
 push    r14
 push    rbx
 sub     rsp, 2072
 mov     r15, rsi
 mov     r14, rdi
 mov     qword, ptr, [rsp, +, 1032], 0
 mov     ebp, 1024
 xor     ebx, ebx
.LBB11_1:
 call    r15
 cmp     rbx, 1024
 jae     .LBB11_3
 mov     byte, ptr, [rsp, +, rbx, +, 8], al
 mov     rbx, qword, ptr, [rsp, +, 1032]
 inc     rbx
 mov     qword, ptr, [rsp, +, 1032], rbx
 dec     ebp
 jne     .LBB11_1
 cmp     rbx, 1024
 jne     .LBB11_7
 lea     rsi, [rsp, +, 8]
 mov     edx, 1024
 mov     rdi, r14
 call    qword, ptr, [rip, +, memcpy@GOTPCREL]
 mov     rax, r14
 add     rsp, 2072
 pop     rbx
 pop     r14
 pop     r15
 pop     rbp
 ret
.LBB11_3:
 lea     rdi, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.0]
 lea     rdx, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.2]
 mov     esi, 30
 call    qword, ptr, [rip, +, _ZN4core9panicking5panic17h90931f06a97cc5e0E@GOTPCREL]
 jmp     .LBB11_4
.LBB11_7:
 lea     r14, [rsp, +, 1040]
 lea     rsi, [rsp, +, 8]
 mov     edx, 1024
 mov     rdi, r14
 call    qword, ptr, [rip, +, memcpy@GOTPCREL]
 mov     qword, ptr, [rsp, +, 2064], rbx
 lea     rdi, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.4]
 lea     rcx, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.5]
 lea     r8, [rip, +, .Lanon.c820c69e724c4167816339e4a82ffdf6.14]
 mov     esi, 43
 mov     rdx, r14
 call    qword, ptr, [rip, +, _ZN4core6result13unwrap_failed17hdc73d4affce1d414E@GOTPCREL]
.LBB11_4:
 ud2
.LBB11_9:
 mov     rbx, rax
 jmp     .LBB11_10
.LBB11_11:
 jmp     .LBB11_12
.LBB11_13:
.LBB11_12:
 mov     rbx, rax
 lea     r14, [rsp, +, 8]
.LBB11_10:
 mov     rdi, r14
 call    core::ptr::drop_in_place<array_builder::ArrayBuilder<u8,1024_usize>>
 mov     rdi, rbx
 call    _Unwind_Resume
 ud2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants