Skip to content

alamb/datafusion-duckdb-benchmark

Repository files navigation

ClickBench: DataFusion / DuckDB comparision scripts

This benchmark compares DataFusion to DuckDB performance with the ClickBench queries aganst the unmodified ClickBench parquet files.

Results

Result Chart

Versions

  • DataFusion 27.0.0
  • DataFusion 28.0.0
  • DuckDB 0.8.1

Scenarios

  • Single parquet file (hits.parquet)

Download Data:

bash setup.sh

Install DataFusion-CLI

Install from crates.io:

cargo install datafusion-cli --version 28.0.0

Or build from source

git clone https://github.com/apache/arrow-datafusion.git
cd datafusion
cargo install --path datafusion-cli

Install DuckDB

python3 -m venv `pwd`/venv
source venv/bin/activate
pip install duckdb psutil

Run queries

queres are run with run-datafusion.sh or run-duckdb.sh.

DuckDB:

CREATE=create-single-duckdb.sql bash run-duckdb.sh

DataFusion

DATAFUSION_CLI=./datafusion-cli.413eba1 CREATE=create-single-datafusion.sql bash run-datafusion.sh

More examples in benchmark.sh

Results

Results are written into result.csv

Python Example

The example python script is hash.py

python3 hash.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published