Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support declarative row and column filtering #23

Open
JacobHayes opened this issue Mar 24, 2021 · 0 comments
Open

Support declarative row and column filtering #23

JacobHayes opened this issue Mar 24, 2021 · 0 comments
Labels
design Design and use cases required enhancement New feature or request

Comments

@JacobHayes
Copy link
Member

  • Support declarative row-wise filters (col X = "..." or X in (...)) of input partitions in the .map method (filtering per (input, output) pair, not input alone), which can be driven by per-partition Statistics the user defines for the Artifact
    • These are orthogonal to column-wise selections, which are defined in the .build method
  • View loading logic is expanded to apply these row and column filters in the best way it can (eg: loading from BQ SELECT <subset> w/ WHERE, Parquet reads subset of columns w/ ddf filtering)
  • Compared to very granular input partitioning, this:
    • has less overhead (fewer upstream partitions to track)
    • has less precise invalidation (less granular upstream partitions)
    • maintains "small" inputs to the build steps

The # of build tasks is still upper-bounded by the # of output partitions or other concurrency limits.

@JacobHayes JacobHayes added enhancement New feature or request design Design and use cases required labels Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design and use cases required enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant