Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bloom filter when reading/writing parquet files #1830

Open
v0y4g3r opened this issue Jun 26, 2023 · 3 comments
Open

Support bloom filter when reading/writing parquet files #1830

v0y4g3r opened this issue Jun 26, 2023 · 3 comments
Labels
Enhancement New feature or request performance Peformance issue
Milestone

Comments

@v0y4g3r
Copy link
Contributor

v0y4g3r commented Jun 26, 2023

What type of enhancement is this?

Performance

What does the enhancement do?

ParquetWriter already supports bloom filter encoding, but we have to apply query clauses to bloom filters during table scan.

Once we can build external index file, we may also switch to xor filter and it's rust implementation for better performance.

@v0y4g3r v0y4g3r added the Enhancement New feature or request label Jun 26, 2023
@v0y4g3r v0y4g3r self-assigned this Jun 26, 2023
@killme2008 killme2008 added the performance Peformance issue label Jul 26, 2023
@killme2008 killme2008 added this to the v0.4 milestone Jul 26, 2023
@killme2008
Copy link
Contributor

@v0y4g3r Any progress?

@killme2008 killme2008 modified the milestones: v0.4, v0.5 Oct 11, 2023
@killme2008
Copy link
Contributor

@v0y4g3r What's the plan for this issue? I am not sure if we still need it.

@evenyag
Copy link
Contributor

evenyag commented Jan 2, 2024

IMO, we should do some benchmarks to compare with the inverted index later as parquet already supports it.

@fengjiachun fengjiachun modified the milestones: v0.5, v0.8 Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request performance Peformance issue
Projects
Status: Todo
Development

No branches or pull requests

4 participants