Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] Better borrow propagation (e.g. RecordBatch::schema() to return &SchemaRef vs SchemaRef) #5463

Closed
alamb opened this issue Mar 4, 2024 · 3 comments · Fixed by #5474
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@alamb
Copy link
Contributor

alamb commented Mar 4, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

For reasons I am not sure of (likely historical), certain APIs of Schema/RecordBatch return owned instances of SchemaRef.

So for example

let schema: SchemaRef = record_batch.schema()

There are at least two challenges here:

  1. This often requires an extra copy when not needed (though it is an Arc::clone which is relatively inexpensive compared to the work done by most Arrow apis)
  2. It makes it hard for borrow propagation to work correctly.

An example of challenges with borrow propagation is illustrated on #5342, which observes, we can't change Schema::try_merge to take (&Schema) even when it doesn't actually need owned ownership of the Schema without creating a temporary Vec<SchemaRef> or something similiar

-    pub fn try_merge(schemas: impl IntoIterator<Item = Self>) -> Result<Self, ArrowError> {
+    pub fn try_merge<'a>(
+        schemas: impl IntoIterator<Item = &'a Schema>,
+    ) -> Result<Schema, ArrowError> {

Describe the solution you'd like
I recommend we add new functions that return a reference to the SchemaRef

So new functions like

    /// Returns a reference to the [`Schema`] of the record batch.
    pub fn schema_ref(&self) -> &SchemaRef {
        &self.schema
    }

Describe alternatives you've considered
One alternative is to change the existing APIs as proposed in #5448

    /// Returns a reference to the [`Schema`] of the record batch.
    pub fn schema(&self) -> &SchemaRef {
        &self.schema
    }

However I think this requires a substantial amount of downstream code change (as explained on #5448 (review))

Additional context
This came up recently on #5342

@tustvold notes we have made similar changes in the past, e.g. #2035 and #313

@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'parquet'} from #5448

@tustvold tustvold added the arrow Changes to the arrow crate label Mar 15, 2024
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #5448

@tustvold tustvold added the arrow-flight Changes to the arrow-flight crate label Mar 15, 2024
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow-flight'} from #5448

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants