You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ParquetReaderBuilder provides a time_range option to filter the timestamps of rows to read. But currently it is not used anywhere when reading parquet files. We need to respect this option which will boost the scan performance when an exact time range is provided.
There's a workaround: we can transform the time range into predicates and inserts them into ParquetReaderBuilder::predicate. These predicates will be applied to the batches read from parquet files.
But this workaround does not skip pages and row groups, which may waste extra IO overhead. We need to use the time range to prune row groups and data pages according the the file metadata.
The text was updated successfully, but these errors were encountered:
What type of enhancement is this?
Performance
What does the enhancement do?
ParquetReaderBuilder
provides atime_range
option to filter the timestamps of rows to read. But currently it is not used anywhere when reading parquet files. We need to respect this option which will boost the scan performance when an exact time range is provided.greptimedb/src/mito2/src/sst/parquet/reader.rs
Line 69 in 5a0629e
Implementation challenges
There's a workaround: we can transform the time range into predicates and inserts them into
ParquetReaderBuilder::predicate
. These predicates will be applied to the batches read from parquet files.But this workaround does not skip pages and row groups, which may waste extra IO overhead. We need to use the time range to prune row groups and data pages according the the file metadata.
The text was updated successfully, but these errors were encountered: