Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add table.createInsertStream for native streaming inserts #997

Merged
merged 21 commits into from Jan 17, 2022

Conversation

steffnay
Copy link
Contributor

@steffnay steffnay commented Aug 27, 2021

This feature adds the ability to use a native Duplex stream for inserting rows into BigQuery via the /insertAll endpoint and reading the API response. Implements batching of rows via RowBatch and RowQueue classes.

Adds:

  • Table.createInsertStream()

  • RowQueue

  • RowBatch

  • Ensure the tests and linter pass

  • Code coverage does not decrease (if any source code was changed)

  • Appropriate docs were updated (if necessary)

Fixes #506 🦕

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/nodejs-bigquery API. label Aug 27, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 27, 2021
@steffnay steffnay marked this pull request as ready for review August 30, 2021 17:22
@steffnay steffnay requested a review from a team August 30, 2021 17:22
@steffnay steffnay requested a review from a team as a code owner August 30, 2021 17:22
@steffnay steffnay added the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 12, 2021
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 12, 2021
src/rowBatch.ts Outdated Show resolved Hide resolved
src/rowQueue.ts Outdated Show resolved Hide resolved
maxOutstandingBytes: 1 * 1024 * 1024,

// The maximum time we'll wait to send batched rows, in milliseconds.
maxDelayMillis: 10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How was 10 seconds chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just based off similar batching of messages in nodejs-pubsub, do you have an idea of a timeframe that is better aligned for BigQuery inserts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I think it has more to do with the customer's requirements than BigQuery limitations. Since it's configurable, aligning the default with pub/sub makes sense to me.

Copy link
Contributor

@feywind feywind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the flood of comments ^^; I'm happy to talk over any of them and/or help fix some of the any comments if you like.

src/rowBatch.ts Outdated Show resolved Hide resolved
src/rowBatch.ts Outdated Show resolved Hide resolved
src/rowBatch.ts Outdated Show resolved Hide resolved
src/rowBatch.ts Outdated Show resolved Hide resolved
src/rowBatch.ts Outdated Show resolved Hide resolved
src/table.ts Outdated Show resolved Hide resolved
test/rowBatch.ts Show resolved Hide resolved
test/rowBatch.ts Outdated Show resolved Hide resolved
test/rowQueue.ts Outdated Show resolved Hide resolved
test/rowQueue.ts Outdated Show resolved Hide resolved
@steffnay steffnay requested a review from tswast January 2, 2022 03:03
@steffnay steffnay requested a review from feywind January 2, 2022 03:03
@steffnay steffnay added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 3, 2022
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 3, 2022
Copy link
Contributor

@feywind feywind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember what all I commented on now, but overall it looks good :)

const opts = typeof options === 'object' ? options : {};

if (opts.insertRowsOptions) {
this.insertRowsOptions = opts.insertRowsOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

};
}),
// eslint-disable-next-line @typescript-eslint/no-explicit-any
row: rows[(insertError as any).index],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good :)

maxOutstandingBytes: 1 * 1024 * 1024,

// The maximum time we'll wait to send batched rows, in milliseconds.
maxDelayMillis: 10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I think it has more to do with the customer's requirements than BigQuery limitations. Since it's configurable, aligning the default with pub/sub makes sense to me.

src/rowQueue.ts Outdated Show resolved Hide resolved
@steffnay steffnay added the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 17, 2022
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jan 17, 2022
@steffnay steffnay merged commit 0ffe544 into googleapis:main Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/nodejs-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement writable streams for natively streaming inserts
3 participants