Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DictionaryArray::occupancy #4415

Merged
merged 1 commit into from Jun 30, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Part of #4414
Relates to #506

Rationale for this change

Split out from #3558 this adds a DictionaryArray::occupancy which provides a mask of the "used" values. This is effectively a selection vector for the values (#4095)

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 14, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like nice addition to me -- thank you @tustvold

let len = self.values.len();
let mut builder = BooleanBufferBuilder::new(len);
builder.resize(len);
let slice = builder.as_slice_mut();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: calling this target or dest or dst I think would be a more descriptive name for what it is rather than slice

let mut builder = BooleanBufferBuilder::new(len);
builder.resize(len);
let slice = builder.as_slice_mut();
match self.keys.nulls().filter(|n| n.null_count() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fancy way to check for nulls -- TIL Option::filter 👍

Some(n) => {
let v = self.keys.values();
n.valid_indices()
.for_each(|idx| set_bit(slice, v[idx].as_usize()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any value in calling valid_slices here instead? I assume it would be a tradeoff for sparse dictionaries and mostly used dictionaries.

@alamb
Copy link
Contributor

alamb commented Jun 15, 2023

As an aside, I really like how the new ScalarBuffer / NullBuffer structures are so easy to work with ❤️

@alamb
Copy link
Contributor

alamb commented Jun 29, 2023

is this PR waiting on anything else? Or shall we merge it?

@tustvold tustvold merged commit 3354a4c into apache:master Jun 30, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants