Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppendableRecordBatch #3142

Open
avantgardnerio opened this issue Nov 19, 2022 · 4 comments
Open

AppendableRecordBatch #3142

avantgardnerio opened this issue Nov 19, 2022 · 4 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@avantgardnerio
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

At Space and Time, we would like to make an OLTP columnar DB based on Arrow (see noisepage for precedent). Unfortunately, Arrow RecordBatches are immutable by design.

Describe the solution you'd like

Fortunately however, append-only and immutable are not necessarily at odds, as long as one takes immutable RecordBatch snapshots at a point-in-time of the append-only version. See this gist for a working PoC implementation:

Screenshot from 2022-11-19 09-46-31

Describe alternatives you've considered

  1. using arrow2
  2. trying some trickery with making the MutableBuffer work, vs preallocating ones from the immutable package
@avantgardnerio avantgardnerio added the enhancement Any new improvement worthy of a entry in the changelog label Nov 19, 2022
@avantgardnerio
Copy link
Contributor Author

Edit: cleaner API with .extend(other: RecordBatch): https://gist.github.com/avantgardnerio/48d977ea6bd28c790cfb6df09250336d

@tustvold
Copy link
Contributor

tustvold commented Nov 21, 2022

Like the general idea, just a couple of comments/questions:

  • If the extend payload is RecordBatch, what do you gain by concatenating them together? Why not just store them separately and periodically compact them? What do you gain from a single RecordBatch over saybVec<RecordBatch>?
  • Similar to the above, why is this an AppendableRecordBatch and not say AppendablePrimitiveArray, etc... This would be more flexible and avoid creating array temporaries
  • I'm not sure how support for booleans would work, unless you can only append multiples of 8
  • You need to know the maximum buffer lengths up front, as you can't realloc the buffers
  • Your comment suggests arrow2 supports this but can't see how, could you point me to it?

One potentially simpler way to implement something similar to this would be to add a non-consuming finish method to the builders. This would entail copying the buffers, but in my experience of implementing the write path for IOx, this copy is insignificant in the grand scheme of query execution - even ignoring the heavy hitters like sorts and groups, all kernels involve copying values to an output array. This is especially true if after a certain number of rows you rotate the builders and just keep the immutable RecordBatch, thereby bounding the copy. What do you think?

Edit: I created #3154 for this

@alamb
Copy link
Contributor

alamb commented Nov 21, 2022

Fortunately however, append-only and immutable are not necessarily at odds, as long as one takes immutable RecordBatch snapshots at a point-in-time of the append-only version.

I think the general pattern of

  • Append data ....
  • Snapshot (by copying into a read only RecordBatch)
  • keep appending data

Makes sense with arrow. In fact this is what we do in IOx on the write path.

This pattern can be done with the various *Builders -- such as https://docs.rs/arrow/27.0.0/arrow/array/type.Int64Builder.html

So something like:

builder.append_value(1);

// get an array to use in datafusion, etc
let array = builder.build();

// make a new builder and start building the second batch
builder = Int64Builder::new();
builder.append_value(2);
builder.append_value(3);

I think what @tustvold is suggesting with " non-consuming finish method to the builders" means you could do something like:

builder.append_value(1);

// get an array to use in datafusion, etc (by copying the underlying values)
let array = builder.snapshot();

// existing builder can be used to append 
builder.append_value(2);
builder.append_value(3);

I don't fully follow the proposal in https://gist.github.com/avantgardnerio/48d977ea6bd28c790cfb6df09250336d

As soon as you have this function:

    pub fn as_slice(&self) -> RecordBatch {
        self.record_batch.slice(0, self.len())
    }

This means that now multiple locations may be able to read that data so trying to update as well will likely be an exercise in frustration as you try and fight the rust compiler to teach it that somehow you have guaranteed that the memory remains valid and that concurrent read and modification are ok.

Maybe we could start with the "make a snapshot based copy" and if the copying is too much figure out some different approach.

@avantgardnerio
Copy link
Contributor Author

Others might still find it useful, but I've convinced myself with how we are planning to handle MVCC we'll need to make copies anyway, so the builder approach is probably fine (given a non-consuming .finish()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants