Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

What's the recommended way to read parquet files with offset + len #1231

Answered by jorgecarleitao
wooden-worm asked this question in Q&A
Discussion options

You must be logged in to vote

Great question. :)

We currently do not support offset pushdown in general parquet.

With that said, we do support it on (either) situations:

  1. pages are written in V2. In this case, we can use the page header to skip based on the page's number of values. Essentially, create a new page iterator that skips pages up to a number of values.
  2. the parquet file has page offsets. This is a feature that allow us to know which page contains which intervals of rows.

Do you know whether you are in one of these situations?

Regardless of this, there is a use-case to add support for offset pushdown to the deserializer.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@wooden-worm
Comment options

@jorgecarleitao
Comment options

@wooden-worm
Comment options

Answer selected by wooden-worm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants