Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing docTypes with no BQ output #1114

Open
jklukas opened this issue Feb 6, 2020 · 0 comments
Open

Consider allowing docTypes with no BQ output #1114

jklukas opened this issue Feb 6, 2020 · 0 comments
Labels
pipeline metadata Should be solved by capturing new metadata in JSON schemas

Comments

@jklukas
Copy link
Contributor

jklukas commented Feb 6, 2020

In discussion with @6a68, we've identified a subset of the FxA log data (amplitudeEvent messages) that we want to process in real time via Pub/Sub, but which don't need to be sent to BigQuery.

We are planning to keep the current Stackdriver BigQuery pipeline in place as the canonical source for scheduled queries to create derived tables, and these amplitudeEvents will be covered by that pipeline. But we also want to have Stackdriver's Pub/Sub output send these amplitudeEvents through the Decoder so that we get a chance to do more rigorous schema validation on the events before routing to amplitude, and we want to take advantage of the existing support for error output so that we don't drop non-conforming messages.

For a first pass, I think it's fine to simply let these amplitudeEvents also flow to a live table in BigQuery, even though they'll be a duplicate of rows loaded via the Stackdriver BQ integration; we can apply a short retention period to the associated stable table to reduce cost if needed. Longer-term, we may want to consider adding configuration in the pipeline to specify a subset of docTypes that are for PubSub output only.

cc @whd @relud

@jklukas jklukas added the pipeline metadata Should be solved by capturing new metadata in JSON schemas label Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pipeline metadata Should be solved by capturing new metadata in JSON schemas
Projects
None yet
Development

No branches or pull requests

1 participant