Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: build veneer for bigquery write client #4366

Closed
shollyman opened this issue Jul 1, 2021 · 3 comments
Closed

bigquery: build veneer for bigquery write client #4366

shollyman opened this issue Jul 1, 2021 · 3 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@shollyman
Copy link
Contributor

Tracks PRs related to veneer work for new streaming client veneer.

internally b/178808000

@shollyman shollyman added the triage me I really want to be triaged. label Jul 1, 2021
@shollyman shollyman self-assigned this Jul 1, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Jul 1, 2021
@shollyman shollyman added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Jul 1, 2021
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 1, 2021
This is the first of multiple PRs to build up the functionality of a new
thick client over the new BigQuery Storage API's write mechanism.

This PR exposes schema conversion between the main bigquery package and
the bigquery storage API.

Towards: #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 12, 2021
This PR introduces two new types:

AppendResult - tracks the progress of an individual row append to
completion, either success or error.  Successful appends _may_ have
an associated offset, failed appends will have an associated error.
The AppendResult has a blocking method users can interrogate.

pendingWrite - handles the state management for a set of rows appended
as a group.  There's a 1:many relationship between
pendingWrite:AppendResult(s), so as a pendingWrite completes all
associated AppendResult references should be updated.

Towards: #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 26, 2021
This PR adds enough of the wiring to the client to being testing via integration tests.  It adapts a similar pattern to the pullstream in pubsub, in that it abstracts individual calls from stream state management.

There's two significant units of future work that may yield changes here:

* For traffic efficiency sake, we only want to add things like the stream ID, schema, and trace ID to the first append on any stream.

* For stream connection retry, we may want to re-send writes that were sent but we didn't get an acknowledgement back.  For default/committed streams, this behavior may yield additional writes (at least once semantics).  For buffered/pending streams, it means either the library or user should know to expect "data already present" for these resent-writes.


Towards #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 28, 2021
#4502)

Writing tests picked up the error, so hooray.

BufferedStream integration test exposed that while the API surface is in preview without special requirements, advanced features such as the FlushRows rpc used by BufferedStream does.  This has been whitelisted for test projects, but we'll want to add this to doc.go when I start that PR.

Towards #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 28, 2021
)

Minor changes:

WithTracePrefix -> WithTraceID for the option and accompanying downstream usage
exported TableParentFromStreamName to aid users of BatchCommit.

The rest of the PR represents docstring improvements.

Towards #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 30, 2021
…4512)

This starts to plumb in oc instrumentation, as we're already using it in
bigquery proper and other veneers.

Testing instrumentation helped catch another double-close in the recv processor, so this
addresses that as well.

Towards #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Jul 30, 2021
…tionality (#4517)

Soon we'll start tackling null values and complex schemas, so this PR augments the existing table validator with one that can be passed multiple validation constraints for evaluation (how many rows, how many nulls, cardinality, etc).  It updates the existing integration tests to use the new validator, and tightens validation to ensure that tests are propagating the appended values as expected.

Towards: #4366
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Jul 30, 2021
We're now doing enough work that we caught a context deadline in the
managedwriter tests.  Bump the timeout to 30s.

Towards googleapis#4366
shollyman added a commit that referenced this issue Jul 30, 2021
…4527)

We're now doing enough work that we caught a context deadline in the
managedwriter tests.  Bump the timeout to 30s.

Towards #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Aug 5, 2021
#4555)

Stress testing caught this one:  responsibility for releasing flow
controlled resources is in markDone of pending write, and the reference
in the recvProcessor was over-freeing resources.

Towards: #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Aug 9, 2021
Additional internal context: b/185842996

Request routing relies on a metadata header being present, but because of the bidi nature
library generators don't automatically attach the write stream metadata
into x-goog-request-headers.

For this API, the stream ID is constant for the whole stream so we
inject it when opening the stream, which is when routing needs the
information.

This causes some minor changes to how we do stream (re)open because
we need to pass in the stream ID as part of the function.  This change
also updates integration testing so that we're testing in an explicit,
non-default region (us-east1).

Towards: #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Aug 12, 2021
…4601)

This PR adds a bytes metric to the list of defined instrumentation
metrics, and adds an additional key to track data origin.  Ability
for users to set the data origin comes a new WithDataOrigin option
that can be passed to the managed stream constructor.

This also does some minor refactoring of how opencensus view creation
is handled.

Towards: #4366
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Aug 26, 2021
This functionality supports the "bring your own proto" case for writing
data.

Towards: googleapis#4366
shollyman added a commit that referenced this issue Aug 27, 2021
…4681)

* feat(bigquery/storage/managedwriter/adapt): add NormalizeDescriptor

This functionality supports the "bring your own proto" case for writing
data.

Towards: #4366
@shollyman
Copy link
Contributor Author

Didn't link it during review, but #4729 changed the AppendRows contract to return a single appendresult future rather than one per row.

shollyman added a commit to shollyman/google-cloud-go that referenced this issue Sep 21, 2021
This PR updates the managedwriter to use the v1 endpoint of
bigquerystorage for interactions with the BigQueryWriteClient.

Towards: googleapis#4366
gcf-merge-on-green bot pushed a commit that referenced this issue Sep 21, 2021
This PR updates the managedwriter to use the v1 endpoint of
bigquerystorage for interactions with the BigQueryWriteClient.

Towards: #4366
gcf-merge-on-green bot pushed a commit that referenced this issue Oct 21, 2021
…dwriter (#5007)

This PR exposes the raw methods for creating and committing streams to the wrapped managedwriter client.

It allows users to interact with all the methods of the underlying API using the managedwriter client (which itself wraps the raw v1 client).  The disadvantage is that it couples managedwriter directly to v1, as it accepts requests in the v1 namespace. The existing append interactions all use abstractions local to the managedwriter.

PR also gets rid of the utility method for batch committing write streams; there's not a great deal of utility saved here vs the underlying method.

Towards: #4366
shollyman added a commit to shollyman/google-cloud-go that referenced this issue Nov 3, 2021
This plumbs the ability to pass gax.CallOption opts to the
underlying client underpinning the ManagedStream.  It also
adds a WithAppendRowsCallOption option to the constructor,
as well as adding direct option passing for operations like
Finalize() and FlushRows().

Towards: googleapis#4366
gcf-merge-on-green bot pushed a commit that referenced this issue Nov 4, 2021
… call options (#5078)

BREAKING CHANGE:  changes function signatures to add variadic call options

This plumbs the ability to pass gax.CallOption opts to the
underlying client underpinning the ManagedStream.  It also
adds a WithAppendRowsCallOption option to the constructor,
as well as adding direct option passing for operations like
Finalize() and FlushRows().

Towards: #4366
@shollyman
Copy link
Contributor Author

Didn't get tagged at commit time, but #5102 added variadic appends

shollyman added a commit to shollyman/google-cloud-go that referenced this issue Dec 21, 2021
This PR introduces no functional changes.  It simply colocates the
two types of variadic options into the same file to aid readability.

Towards: googleapis#4366
shollyman added a commit that referenced this issue Dec 21, 2021
This PR introduces no functional changes.  It simply colocates the
two types of variadic options into the same file to aid readability.

Towards: #4366
BrennaEpp pushed a commit to BrennaEpp/google-cloud-go that referenced this issue Dec 23, 2021
…apis#5239)

This PR introduces no functional changes.  It simply colocates the
two types of variadic options into the same file to aid readability.

Towards: googleapis#4366
@shollyman
Copy link
Contributor Author

Closing this FR at this point. Tagging relevant the PRs is becoming more noisy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

1 participant