Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add params to event gather pipeline to allow long-runnable and log errors / skipped events #195

Open
evamaxfield opened this issue Jun 27, 2022 · 0 comments
Labels
enhancement New feature or request event gather pipeline A feature or bugfix relating to event processing

Comments

@evamaxfield
Copy link
Member

Feature Description

A clear and concise description of the feature you're requesting.

Add parameters:

  • batch-size an optional integer that will be used to iteratively slice and run the pipeline on that many events at a time. I.e. if the gather for the specified time range finds 50 events but the batch size is 10, the pipeline will run 5 independent times each with 10 events to process.
  • skip-errored-events-during-processing that will ignore events that raise an error during processing. Enough debug info should be gathered / kept that the log printed out after the pipeline finishes contains the event details and "the thing that errored".
  • skip-errored-events-during-gather that will ignore events that fail to scrape / gather. Similar to the above parameter, enough debug info should be printed after scraping. "Found 20 events, skipping 2 due to errors" for example.

Also would be really interesting to see if I can allow certain errors. retry-errors=[ConnectionError]

Use Case

Please provide a use case to help us understand your request in context.

I am backfilling a lot of data for certain instances and it is becoming annoying to process week by week. This is generally required for a couple of reasons:

  • storage space on machine (GHA runners only have 16 GB of disk so can't download and process more than ~4 meeting videos at a time) -- hence batch size
  • there are errors in less than 1% of events that aren't random connection errors. These are things like the video page being parsed incorrectly and such.
@evamaxfield evamaxfield added enhancement New feature or request event gather pipeline A feature or bugfix relating to event processing labels Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request event gather pipeline A feature or bugfix relating to event processing
Projects
None yet
Development

No branches or pull requests

1 participant