Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to calculate the size of the hashbrown::HashMap at runtime? #506

Open
iddm opened this issue Feb 19, 2024 · 1 comment
Open

How to calculate the size of the hashbrown::HashMap at runtime? #506

iddm opened this issue Feb 19, 2024 · 1 comment

Comments

@iddm
Copy link

iddm commented Feb 19, 2024

I need to calculate how much memory the hashmap object with all the contents occupies. How could I do that?

@cuviper
Copy link
Member

cuviper commented Mar 24, 2024

I doubt it will ever be exposed, but for a given version you can inspect the source. The immediate size is just std::mem::size_of::<HashMap<K, V>>(), and the heap size comes from the computed layout here:

hashbrown/src/raw/mod.rs

Lines 233 to 276 in 3741813

/// Helper which allows the max calculation for ctrl_align to be statically computed for each T
/// while keeping the rest of `calculate_layout_for` independent of `T`
#[derive(Copy, Clone)]
struct TableLayout {
size: usize,
ctrl_align: usize,
}
impl TableLayout {
#[inline]
const fn new<T>() -> Self {
let layout = Layout::new::<T>();
Self {
size: layout.size(),
ctrl_align: if layout.align() > Group::WIDTH {
layout.align()
} else {
Group::WIDTH
},
}
}
#[inline]
fn calculate_layout_for(self, buckets: usize) -> Option<(Layout, usize)> {
debug_assert!(buckets.is_power_of_two());
let TableLayout { size, ctrl_align } = self;
// Manual layout calculation since Layout methods are not yet stable.
let ctrl_offset =
size.checked_mul(buckets)?.checked_add(ctrl_align - 1)? & !(ctrl_align - 1);
let len = ctrl_offset.checked_add(buckets + Group::WIDTH)?;
// We need an additional check to ensure that the allocation doesn't
// exceed `isize::MAX` (https://github.com/rust-lang/rust/pull/95295).
if len > isize::MAX as usize - (ctrl_align - 1) {
return None;
}
Some((
unsafe { Layout::from_size_align_unchecked(len, ctrl_align) },
ctrl_offset,
))
}
}

... where T = (K, V) for a HashMap<K, V>, and buckets can be determined from the reported capacity like:

hashbrown/src/raw/mod.rs

Lines 188 to 217 in 3741813

/// Returns the number of buckets needed to hold the given number of items,
/// taking the maximum load factor into account.
///
/// Returns `None` if an overflow occurs.
// Workaround for emscripten bug emscripten-core/emscripten-fastcomp#258
#[cfg_attr(target_os = "emscripten", inline(never))]
#[cfg_attr(not(target_os = "emscripten"), inline)]
fn capacity_to_buckets(cap: usize) -> Option<usize> {
debug_assert_ne!(cap, 0);
// For small tables we require at least 1 empty bucket so that lookups are
// guaranteed to terminate if an element doesn't exist in the table.
if cap < 8 {
// We don't bother with a table size of 2 buckets since that can only
// hold a single element. Instead we skip directly to a 4 bucket table
// which can hold 3 elements.
return Some(if cap < 4 { 4 } else { 8 });
}
// Otherwise require 1/8 buckets to be empty (87.5% load)
//
// Be careful when modifying this, calculate_layout relies on the
// overflow check here.
let adjusted_cap = cap.checked_mul(8)? / 7;
// Any overflows will have been caught by the checked_mul. Also, any
// rounding errors from the division above will be cleaned up by
// next_power_of_two (which can't overflow because of the previous division).
Some(adjusted_cap.next_power_of_two())
}

All of that is subject to change, and of course if your K or V types have additional indirect memory then you'll need to account for that too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants