Skip to content

Latest commit

 

History

History
112 lines (83 loc) · 2.9 KB

library.md

File metadata and controls

112 lines (83 loc) · 2.9 KB

Using DataFusion as a library

Create a new project

cargo new hello_datafusion
$ cd hello_datafusion
$ tree .
.
├── Cargo.toml
└── src
    └── main.rs

1 directory, 2 files

Default Configuration

DataFusion is published on crates.io, and is well documented on docs.rs.

To get started, add the following to your Cargo.toml file:

[dependencies]
datafusion = "11.0"

Create a main function

Update the main.rs file with your first datafusion application based on Example usage

use datafusion::prelude::*;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
  // register the table
  let ctx = SessionContext::new();
  ctx.register_csv("test", "<PATH_TO_YOUR_CSV_FILE>", CsvReadOptions::new()).await?;

  // create a plan to run a SQL query
  let df = ctx.sql("SELECT * FROM test").await?;

  // execute and print results
  df.show().await?;
  Ok(())
}

Optimized Configuration

For an optimized build several steps are required. First, use the below in your Cargo.toml. It is worth noting that using the settings in the [profile.release] section will significantly increase the build time.

[dependencies]
datafusion = { version = "11.0" , features = ["simd"]}
tokio = { version = "^1.0", features = ["rt-multi-thread"] }
snmalloc-rs = "0.2"

[profile.release]
lto = true
codegen-units = 1

Then, in main.rs. update the memory allocator with the below after your imports:

use datafusion::prelude::*;

#[global_allocator]
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;

async fn main() -> datafusion::error::Result<()> {
  ...
}

Finally, in order to build with the simd optimization cargo nightly is required.

rustup toolchain install nightly

Based on the instruction set architecture you are building on you will want to configure the target-cpu as well, ideally with native or at least avx2.

RUSTFLAGS='-C target-cpu=native' cargo +nightly run --release