Consider allowing docTypes with no BQ output #1114

jklukas · 2020-02-06T14:20:49Z

In discussion with @6a68, we've identified a subset of the FxA log data (amplitudeEvent messages) that we want to process in real time via Pub/Sub, but which don't need to be sent to BigQuery.

We are planning to keep the current Stackdriver BigQuery pipeline in place as the canonical source for scheduled queries to create derived tables, and these amplitudeEvents will be covered by that pipeline. But we also want to have Stackdriver's Pub/Sub output send these amplitudeEvents through the Decoder so that we get a chance to do more rigorous schema validation on the events before routing to amplitude, and we want to take advantage of the existing support for error output so that we don't drop non-conforming messages.

For a first pass, I think it's fine to simply let these amplitudeEvents also flow to a live table in BigQuery, even though they'll be a duplicate of rows loaded via the Stackdriver BQ integration; we can apply a short retention period to the associated stable table to reduce cost if needed. Longer-term, we may want to consider adding configuration in the pipeline to specify a subset of docTypes that are for PubSub output only.

cc @whd @relud

jklukas added the pipeline metadata Should be solved by capturing new metadata in JSON schemas label Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider allowing docTypes with no BQ output #1114

Consider allowing docTypes with no BQ output #1114

jklukas commented Feb 6, 2020

Consider allowing docTypes with no BQ output #1114

Consider allowing docTypes with no BQ output #1114

Comments

jklukas commented Feb 6, 2020