Support declarative row and column filtering #23

JacobHayes · 2021-03-24T21:44:57Z

Support declarative row-wise filters (col X = "..." or X in (...)) of input partitions in the .map method (filtering per (input, output) pair, not input alone), which can be driven by per-partition Statistics the user defines for the Artifact
- These are orthogonal to column-wise selections, which are defined in the .build method
View loading logic is expanded to apply these row and column filters in the best way it can (eg: loading from BQ SELECT <subset> w/ WHERE, Parquet reads subset of columns w/ ddf filtering)
Compared to very granular input partitioning, this:
- has less overhead (fewer upstream partitions to track)
- has less precise invalidation (less granular upstream partitions)
- maintains "small" inputs to the build steps

The # of build tasks is still upper-bounded by the # of output partitions or other concurrency limits.

The text was updated successfully, but these errors were encountered:

JacobHayes added enhancement New feature or request design Design and use cases required labels Mar 24, 2021

Provide feedback