Add BitSet #235

clarfonthey · 2023-07-26T03:06:45Z

Implements #171.

I was also interested in this, so, I decided to take the time to actually create a reasonable wrapper that fits well into the library. It borrows its API from the most similar existing structure in libstd, which is BTreeSet.

This is an "MVP" implementation, meaning that it offers an API that is still useful but missing a few operations that don't have implementations elsewhere in the crate. Specifically, it omits:

is_disjoint, is_subset, is_superset checks: I feel like these kinds of checks should probably have a mirrored API for BitSlice before they're implemented here. Specifically, some way of representing binary folds like (x & y).not_all() without allocating.
Difference, SymmetricDifference, Intersection, and Union: these also should have a similar API on BitSlice, except closer to (x & y).iter_ones() without allocating
IntoIterator: while iterators over references just use IterOnes, this would require an IterOnes-like iterator for BitBox to be efficient.

I've added commented-out boilerplate for those omitted implementations so that someone who has the desire for them knows where to start. If this isn't reasonable to include in the released library, I'd be more than happy to remove them, but I figured I might as well include my work here.

The most relevant design decision here is that this is explicitly a wrapper around BitVec, which means it doesn't add affordances you'd get from something like BTreeSet which will precompute the number of elements in the set. This also means that you can freely access the underlying BitVec without disrupting any invariants.

That said, there is one "invariant" that is offered internally by utilities: in some cases, it's most useful to ensure that the set is "trimmed," i.e. the last bit is a one. Two sets can be compared by comparing their trimmed versions, which I call "shrunken" in the code since it lines up closer to the existing shrink_to_fit method. (Even though it's not quite the same.) I try to uphold this invariant in some of the methods (not leaving too much "slack" at the end of the set), but it's not explicitly required for the methods to work.

clarfonthey · 2023-07-26T03:07:25Z

rustfmt-stable.toml

@@ -13,7 +13,7 @@
 # attr_fn_like_width = 70            # Leave implicit
 # chain_width = 60                   # Leave implicit
 edition = "2018"
-fn_args_layout = "Tall"
+fn_params_layout = "Tall"


These changes were made because the original name was deprecated in the latest nightly rustfmt, and I needed to make them for just format to work properly. I updated the stable version too since I figured it'd be a good idea.

clarfonthey · 2023-07-26T03:08:13Z

src/set.rs

+
+	/// Creates a new bit-set for a range of indices.
+	#[inline]
+	pub fn from_range(range: ops::Range<usize>) -> Self {


This felt like the best analogue for BitVec::repeat to me. Since only allowing RangeFrom seemed weird, I decided to accept a proper Range instead.

clarfonthey · 2023-07-26T03:10:42Z

src/set/api.rs

+			self.inner.resize(value + 1, false);
+			self.inner.set(value, true);


Initially I was thinking of avoiding the "redundant" write to the final bit, but then realised that due to the way fill works, it would probably just make the result slower anyway, since it would always need a special write to the last element anyway. If there's a better way of doing this, would love to know about it.

clarfonthey · 2023-07-26T03:11:47Z

src/set/api.rs

+			// NOTE: it's unclear how this affects performance and if we should
+			// do this automatically, or require it manually only
+			self.shrink_inner();


I haven't put much thought into this at all, so, would love input on whether this is a good idea.

clarfonthey · 2023-07-26T03:12:44Z

src/set/api.rs

+		self.inner
+			.iter_mut()
+			.enumerate()


Fully aware that this is super inefficient, but iter_ones doesn't have a version that would play nice with mutation. Would be open to not landing this method if it seems dishonest to include here (which, it probably is).

clarfonthey · 2023-07-26T03:14:41Z

src/set/api.rs

+	/// assert_eq!(set_iter.next(), None);
+	/// ```
+	#[inline]
+	pub fn iter(&self) -> IterOnes<'_, T, O> {


I could change this to a wrapper if desired, but I figured reusing the original type was the best idea since any users here should be clear what API is being used.

Update rustfmt.toml

a553ab2

clarfonthey commented Jul 26, 2023

View reviewed changes

clarfonthey force-pushed the bitset branch from 9bb882a to 41926f3 Compare July 26, 2023 03:39

Add BitSet

8179b48

clarfonthey force-pushed the bitset branch from 41926f3 to 8179b48 Compare July 26, 2023 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BitSet #235

Add BitSet #235

clarfonthey commented Jul 26, 2023

clarfonthey Jul 26, 2023 •

edited

clarfonthey Jul 26, 2023

clarfonthey Jul 26, 2023 •

edited

clarfonthey Jul 26, 2023

clarfonthey Jul 26, 2023 •

edited

clarfonthey Jul 26, 2023

		self.inner.resize(value + 1, false);
		self.inner.set(value, true);

Add BitSet #235

Are you sure you want to change the base?

Add BitSet #235

Conversation

clarfonthey commented Jul 26, 2023

clarfonthey Jul 26, 2023 • edited

Choose a reason for hiding this comment

clarfonthey Jul 26, 2023

Choose a reason for hiding this comment

clarfonthey Jul 26, 2023 • edited

Choose a reason for hiding this comment

clarfonthey Jul 26, 2023

Choose a reason for hiding this comment

clarfonthey Jul 26, 2023 • edited

Choose a reason for hiding this comment

clarfonthey Jul 26, 2023

Choose a reason for hiding this comment

clarfonthey Jul 26, 2023 •

edited

clarfonthey Jul 26, 2023 •

edited

clarfonthey Jul 26, 2023 •

edited