Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Row size methods (#3160) #3163

Merged
merged 3 commits into from Nov 23, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
21 changes: 21 additions & 0 deletions arrow/src/row/interner.rs
Expand Up @@ -157,6 +157,15 @@ impl OrderPreservingInterner {
pub fn value(&self, key: Interned) -> &[u8] {
self.values.index(key)
}

/// Returns the size of this instance in bytes including self
pub fn size(&self) -> usize {
std::mem::size_of::<Self>()
+ self.values.buffer_size()
+ self.values.buffer_size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

twice?

+ self.bucket.size()
+ self.lookup.capacity() * std::mem::size_of::<Interned>()
}
}

/// A buffer of `[u8]` indexed by `[Interned]`
Expand Down Expand Up @@ -192,6 +201,11 @@ impl InternBuffer {
self.offsets.push(self.values.len());
key
}

/// Returns the byte size of the associated buffers
fn buffer_size(&self) -> usize {
self.values.capacity() + self.offsets.capacity() * std::mem::size_of::<usize>()
}
}

impl Index<Interned> for InternBuffer {
Expand Down Expand Up @@ -324,6 +338,13 @@ impl Bucket {
}
}
}

/// Returns the size of this instance in bytes
fn size(&self) -> usize {
std::mem::size_of::<Self>()
+ self.slots.capacity() * std::mem::size_of::<Slot>()
+ self.next.as_ref().map(|x| x.size()).unwrap_or_default()
}
}

#[cfg(test)]
Expand Down
34 changes: 34 additions & 0 deletions arrow/src/row/mod.rs
Expand Up @@ -358,6 +358,14 @@ impl SortField {
pub fn new_with_options(data_type: DataType, options: SortOptions) -> Self {
Self { options, data_type }
}

/// Return size of this instance in bytes.
///
/// Includes the size of `Self`.
pub fn size(&self) -> usize {
self.data_type.size() + std::mem::size_of::<Self>()
- std::mem::size_of::<DataType>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why minus size of DataType?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to count for options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options is counted by size_of<Self> we need to subtract DataType as otherwise it is counted twice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.data_type.size() isn't equal to std::mem::size_of::<DataType>()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is that, plus any additional memory from nested fields

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, - std::mem::size_of::<DataType>() is for data_type: DataType which is already counted in by std::mem::size_of::<Self>()?

}
}

impl RowConverter {
Expand Down Expand Up @@ -480,6 +488,22 @@ impl RowConverter {
})
.collect()
}

/// Returns the size of this instance in bytes
///
/// Includes the size of `Self`.
pub fn size(&self) -> usize {
std::mem::size_of::<Self>()
+ std::mem::size_of_val(&self.interners)
tustvold marked this conversation as resolved.
Show resolved Hide resolved
+ self.fields.iter().map(|x| x.size()).sum::<usize>()
+ self.interners.capacity()
* std::mem::size_of::<Option<Box<OrderPreservingInterner>>>()
+ self
.interners
.iter()
.filter_map(|x| x.as_ref().map(|x| x.size()))
.sum::<usize>()
}
}

/// A row-oriented representation of arrow data, that is normalized for comparison.
Expand Down Expand Up @@ -512,6 +536,16 @@ impl Rows {
pub fn iter(&self) -> RowsIter<'_> {
self.into_iter()
}

/// Returns the size of this instance in bytes
///
/// Includes the size of `Self`.
pub fn size(&self) -> usize {
// Size of fields is accounted for as part of RowConverter
std::mem::size_of::<Self>()
+ self.buffer.len()
+ self.offsets.len() * std::mem::size_of::<usize>()
}
}

impl<'a> IntoIterator for &'a Rows {
Expand Down