Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the length field for Buffer and use more Buffer in IPC reader to avoid memory copy. #2557

Merged
merged 3 commits into from Aug 23, 2022

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented Aug 23, 2022

Which issue does this PR close?

Closes #2524.
Related to #2437.

Rationale for this change

I put 2 things (Adding length field and updating IPC reader) together in this PR because I want to (informally) prove that
adding length field is valuable.

Besides, I don't lift the length out of Bytes, although @tustvold suggested it, because it will cause more API changes. We could do that in following PRs.

What changes are included in this PR?

Are there any user-facing changes?

Yes.

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@github-actions github-actions bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Aug 23, 2022
@@ -354,7 +355,7 @@ async fn save_uploaded_chunks(

let batch = record_batch_from_message(
message,
&data.data_body,
&Buffer::from(data.data_body),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find a way to avoid memory copy here. (as I'm not familiar with arrow-flight)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would likely need to move to using https://docs.rs/bytes/latest/bytes/ to avoid this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would likely need to move to using https://docs.rs/bytes/latest/bytes/ to avoid this.

Hi @tustvold. Could you please file an issue to track it, because you might know more context about it ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer::from_custom_allocation could avoid the copy.

Example / test case

Copy link
Contributor

@tustvold tustvold Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that API handle alignment?

Edit: I seem to remember something in the IPC format about buffer alignment, so perhaps this would be fine 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot about alignment. That would only work if the ipc buffer is properly aligned for the type. I don't know what the requirements about that are, so the suggestion might be a bit fragile, or would need to check the alignment and do a copy as a fallback.

@HaoYang670 HaoYang670 changed the title Add the length field for Buffer and use more Buffer in IPC reader. Add the length field for Buffer and use more Buffer in IPC reader to avoid memory copy. Aug 23, 2022
@tustvold tustvold merged commit ad84176 into apache:master Aug 23, 2022
@ursabot
Copy link

ursabot commented Aug 23, 2022

Benchmark runs are scheduled for baseline = 3430537 and contender = ad84176. ad84176 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@HaoYang670 HaoYang670 deleted the add_buffer_length branch August 23, 2022 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the length field for Buffer
4 participants