Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory of concat kernel #347

Closed
ritchie46 opened this issue May 25, 2021 · 4 comments
Closed

Reduce memory of concat kernel #347

ritchie46 opened this issue May 25, 2021 · 4 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@ritchie46
Copy link
Contributor

The concat kernel concats multiple arrays to contiguous memory. It has the potential to only allocate the required memory. However for (large)-utf8 and for (large)-list it does not do so and relies on exponential allocation to get the required memory.

Precomputing the needed capacity of buffers would be faster and less memory heavy.

@ritchie46 ritchie46 added the enhancement Any new improvement worthy of a entry in the changelog label May 25, 2021
@Dandandan
Copy link
Contributor

@ritchie46 is this closed in your eyes by #348?

@ritchie46
Copy link
Contributor Author

I think we can apply the same logic to arrays with child data.

  • list
  • large-list
  • struct
  • dictionary

come to mind.

@jorgecarleitao
Copy link
Member

I agree that it would be great to have a method to specify all capacities.

The required capacity is usually dependent on the problem over which MutableArrayData is used for. Therefore, I suggest that the calculation of which capacity to use to be made by the user of MutableArrayData, and not inside MutableArrayData itself.

In this context, a way to address this is to allow something like

MutableArrayData::with_capacities(capacities: Capacities);

enum Capacities {
   Binary(usize, Option<usize>),  // Binary, Utf8
   List(usize, Option<Box<Capacities>>),
   ...
}

and let MutableArrayData panic if the capacity variant is incompatible with the arrays' DataType.

This gives users the freedom to pass other capacities.

@jhorstmann
Copy link
Contributor

Support for list capacities was added in #382 and #1885, adding support for at least single-level lists in the concat kernel should be relatively easy now by extending the match statement in

_ => MutableArrayData::new(arrays, false, capacity),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

5 participants