Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix StructArrayReader handling nested lists (#1651) #1700

Merged
merged 2 commits into from May 19, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented May 13, 2022

Which issue does this PR close?

Closes #1651.

Rationale for this change

See ticket

What changes are included in this PR?

Fixes handling of nested lists within StructArrayReader

Are there any user-facing changes?

Files that used to fail to parse, now parse correctly

@github-actions github-actions bot added the parquet Changes to the parquet crate label May 13, 2022
.map(|buf| unsafe { buf.typed_data() })
// Children definition levels should describe the same parent structure,
// so return key_reader only
self.key_reader.get_def_levels()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by fix, part of #1699

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a big fan of less unsafe 👍

@tustvold tustvold force-pushed the struct-array-nested-lists branch 2 times, most recently from c5a8d4b to 230eb38 Compare May 13, 2022 13:43
@tustvold tustvold marked this pull request as ready for review May 13, 2022 19:20
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold

I found this easier to review by ignoring whitespace https://github.com/apache/arrow-rs/pull/1700/files?w=1

As I am not a super expert in this code, I can't say I fully grok it but if it fixes the file from @kesavkolla sounds good to me.

I had one question on the tests, but I suspect it is my own mis-understanding

cc @paddyhoran @nevi-me


// Safety: the buffer is always treated as `u16` in the code below
let def_level_data = unsafe { def_level_data_buffer.typed_data_mut() };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing some unsafe

None,
]));

let nulls = Buffer::from([0b00000111]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this set the first three elements in the struct array to NULL?

That doesn't seem consistent with the structure created in the comments above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename to validity, it is an arrow null mask... I agree the naming is perpetually confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaah!

.map(|buf| unsafe { buf.typed_data() })
// Children definition levels should describe the same parent structure,
// so return key_reader only
self.key_reader.get_def_levels()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a big fan of less unsafe 👍

@tustvold tustvold merged commit a30e787 into apache:master May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StructArrayReader Cannot Handle Nested Lists
2 participants