ParquetReader does not respect time range provided #3944

v0y4g3r · 2024-05-15T07:12:06Z

What type of enhancement is this?

Performance

What does the enhancement do?

ParquetReaderBuilder provides a time_range option to filter the timestamps of rows to read. But currently it is not used anywhere when reading parquet files. We need to respect this option which will boost the scan performance when an exact time range is provided.

greptimedb/src/mito2/src/sst/parquet/reader.rs

Line 69 in 5a0629e

time_range: Option<TimestampRange>,

Implementation challenges

There's a workaround: we can transform the time range into predicates and inserts them into ParquetReaderBuilder::predicate. These predicates will be applied to the batches read from parquet files.

But this workaround does not skip pages and row groups, which may waste extra IO overhead. We need to use the time range to prune row groups and data pages according the the file metadata.

The text was updated successfully, but these errors were encountered:

v0y4g3r mentioned this issue May 15, 2024

feat: respect time range when building parquet reader #3947

Merged

3 tasks

killme2008 closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParquetReader does not respect time range provided #3944

ParquetReader does not respect time range provided #3944

v0y4g3r commented May 15, 2024 •

edited

ParquetReader does not respect time range provided #3944

ParquetReader does not respect time range provided #3944

Comments

v0y4g3r commented May 15, 2024 • edited

What type of enhancement is this?

What does the enhancement do?

Implementation challenges

v0y4g3r commented May 15, 2024 •

edited