Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Field::size and DataType::size #3147

Closed
crepererum opened this issue Nov 21, 2022 · 2 comments · Fixed by #3149
Closed

Add Field::size and DataType::size #3147

crepererum opened this issue Nov 21, 2022 · 2 comments · Fixed by #3149
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@crepererum
Copy link
Contributor

crepererum commented Nov 21, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In DataFusion, it would be nice to know how much data is allocated so we can bail out early instead of OOMing (a slight over-allocation is OK, so this can be measured after the fact). For that purpose, it would be nice to know how much memory a specific instance of Field/DataType requires.

Describe the solution you'd like
Add:

impl Field {
    /// Return size of this instance in bytes.
    ///
    /// Includes the size of `Self`.
    pub fn size(&self) -> usize {...}
}

impl DataType {
    /// Return size of this instance in bytes.
    ///
    /// Includes the size of `Self`.
    pub fn size(&self) -> usize {...}
}

Note that fields and data types have a cyclic dependency, so we probably need to implement both in a single PR.

Describe alternatives you've considered
Not adding this feature to arrow but let downstream users (e.g. DataFusion) do this.

Additional context
-

@crepererum crepererum added the enhancement Any new improvement worthy of a entry in the changelog label Nov 21, 2022
@crepererum crepererum changed the title Add Field::size Add Field::size and DataType::size Nov 21, 2022
crepererum added a commit to crepererum/arrow-rs that referenced this issue Nov 21, 2022
Add a way to calculate in-memory size of `Field` and `DataType`.

Closes apache#3147.
crepererum added a commit to crepererum/arrow-rs that referenced this issue Nov 21, 2022
Add a way to calculate in-memory size of `Field` and `DataType`.

Closes apache#3147.
crepererum added a commit to crepererum/arrow-rs that referenced this issue Nov 21, 2022
Add a way to calculate in-memory size of `Field` and `DataType`.

Closes apache#3147.
tustvold pushed a commit that referenced this issue Nov 22, 2022
Add a way to calculate in-memory size of `Field` and `DataType`.

Closes #3147.
@alamb
Copy link
Contributor

alamb commented Nov 25, 2022

label_issue.py automatically added labels {'parquet'} from #3148

@alamb alamb added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Nov 25, 2022
@alamb
Copy link
Contributor

alamb commented Nov 25, 2022

label_issue.py automatically added labels {'arrow'} from #3148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants