Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient creation of all-ones bitset #101

Open
hniksic opened this issue Mar 13, 2024 · 8 comments
Open

Efficient creation of all-ones bitset #101

hniksic opened this issue Mar 13, 2024 · 8 comments

Comments

@hniksic
Copy link

hniksic commented Mar 13, 2024

I need to create a bitset of known size with all bits set. Currently I do it with:

fn ones_with_capacity(bits: usize) -> FixedBitSet {
    let mut everything = FixedBitSet::with_capacity(bits);
    everything.set_range(.., true);
    everything
}

This is not hard to write, but unlike Vec::with_capacity(), FixedBitSet::with_capacity() initializes the bitset to zeros, only for my code to immediately set it to ones. With largish bitsets I'd like to avoid the unnecessary initial resetting. Looking at assembly output in release mode, it seems that the compiler doesn't inline with_capacity(), from which I infer that it cannot eliminate the initial zeroing.

The question is, what is the most efficient way to create an all-ones bitset? Is it the above code, or rather something like:

fn ones_with_capacity(bits: usize) -> FixedBitSet {
    FixedBitSet::with_capacity_and_blocks(bits, std::iter::repeat(!0))
}

And should the crate include a constructor like FixedBitSet::ones_with_capacity() (possibly with a more optimized implementation)?

@james7132
Copy link
Collaborator

As of #86, your implementation of ones_with_capacity should be close to optimal.

@hniksic
Copy link
Author

hniksic commented Mar 19, 2024

As of #86, your implementation of ones_with_capacity should be close to optimal.

@james7132 Sorry if the answer is obvious, but are you referring to the first or the second implementation?

Looking at the implmentation of with_capacity_and_blocks(), the second one seems incorrect. with_capacity_and_blocks() starts off by collecting blocks, and ones_with_capacity() is passing it an infinite iterator. I think it should ideally collect blocks.take(bits.div_ceil(8)), but that's material for a separate PR.

@james7132
Copy link
Collaborator

Your second one is close to optimal now. There is no collect after #86. It preallocates a zeroed buffer of the right size then writes over it using the provided blocks up to the provided capacity. This does a second pass over the buffer, so we can probably make it more efficient by using a Vec<MaybeUninit<SimdBlock>> first. That said, the calloc from the first allocation should be pretty fast compared to an iterator collect, so I'm not sure exactly how much would be gained from that.

@hniksic
Copy link
Author

hniksic commented Mar 20, 2024

Thanks for the clarification! Would you accept a PR that adds a ones_with_capacity() constructor (possibly under another name)? Creating an all-1 bitset feels like a fairly fundamental operation, and I can imagine I'm not the first one to need it.

@james7132
Copy link
Collaborator

Yes go ahead! Should be pretty close to with_capacity, but some care needs to keep the last few bits beyond the fixed length are zeroed out.

@hniksic
Copy link
Author

hniksic commented Mar 21, 2024

@james7132 My idea was to literally use the second implementation. Do you think that would be suboptimal?

Ideally with_capacity_and_blocks() would avoid zeroing out the whole bitset before writing to it. (And it should stlil zero out the part not covered by blocks.) Perhaps that could be achieved by adding an unsafe with_capacity_uninit(), and replacing assignment to subblock with ptr::write(). Something like:

pub fn with_capacity_and_blocks<I: IntoIterator<Item = Block>>(bits: usize, blocks: I) -> Self {
    unsafe {
        // start off with uninitialized bitset for efficiency
        let mut bitset = Self::with_capacity_uninit(bits);
        let Range {
            start: mut subblock,
            end,
        } = bitset.as_mut_slice().as_mut_ptr_range();
        // copy data from `blocks`
        for value in blocks {
            if subblock == end {
                break;
            }
            subblock.write(value);
            subblock = subblock.add(1);
        }
        // zero out the rest
        while subblock != end {
            subblock.write(0);
            subblock = subblock.add(1);
        }
        bitset
    }
}

Then a ones constructor could be implemented as Self::with_capacity_and_blocks(bits, std::iter::repeat(!0)) with (as far as I can tell) no loss of performance.

@james7132
Copy link
Collaborator

The main gotcha with that implementation is that a Vec<MaybeUninit> needs to be used and we cannot convert it to a type that is initialized until the entire vec is populated for that implementation to be sound. This likely means many of the private utility functions you're using there are likely to be unusable.

@hniksic
Copy link
Author

hniksic commented Mar 22, 2024

Oh, right, thanks for explaining. I think we can avoid MaybeUninit as long as we only use pointers to write to uninitialized data, and never create references to it. The standard idiom is to allocate with Vec::with_capacity(), fill the allocated data, and then use set_len() to bless it. In our case we don't even need set_len(), it's enough to create the final FixedBitSet from the pointer:

pub fn with_capacity_and_blocks<I: IntoIterator<Item = Block>>(bits: usize, blocks: I) -> Self {
    if bits == 0 {
        return Self::new();
    }

    let simd_block_cnt = bits.div_ceil(SimdBlock::BITS);
    let block_cnt = bits.div_ceil(Block::BITS as usize);

    // SAFETY: We use Vec::with_capacity() to obtain uninitialized memory, and
    // initialize all of it before passing ownership to the returned FixedBitSet
    unsafe {
        let mut vec = Vec::<SimdBlock>::with_capacity(simd_block_cnt);
        let mut subblock = vec.as_mut_ptr().cast::<Block>();
        let subblock_end = subblock.add(block_cnt);
        assert!(subblock_end != subblock); // we handle bits == 0 at the beginning
        // copy as much as we can from blocks
        for value in blocks {
            subblock.write(value);
            subblock = subblock.add(1);
            if subblock == subblock_end {
                break;
            }
        }
        // zero out the remainder of the allocation
        let simd_block_end = vec.as_mut_ptr().add(simd_block_cnt).cast::<Block>();
        core::ptr::write_bytes(subblock, 0, simd_block_end.offset_from(subblock) as usize);
        let data = NonNull::new_unchecked(vec.as_mut_ptr());
        // FixedBitSet is taking over the ownership of vec's data
        core::mem::forget(vec);
        FixedBitSet {
            data,
            capacity: simd_block_cnt,
            length: bits,
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants