Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parquet-fromcsv (#1) #1798

Merged
merged 14 commits into from
Jun 10, 2022
7 changes: 6 additions & 1 deletion parquet/Cargo.toml
Expand Up @@ -49,6 +49,7 @@ serde_json = { version = "1.0", features = ["preserve_order"], optional = true }
rand = "0.8"
futures = { version = "0.3", optional = true }
tokio = { version = "1.0", optional = true, default-features = false, features = ["macros", "fs", "rt", "io-util"] }
anyhow = { version = "1.0.57", optional = true }
kazuk marked this conversation as resolved.
Show resolved Hide resolved

[dev-dependencies]
criterion = "0.3"
Expand All @@ -66,7 +67,7 @@ all-features = true

[features]
default = ["arrow", "snap", "brotli", "flate2", "lz4", "zstd", "base64"]
cli = ["serde_json", "base64", "clap"]
cli = ["serde_json", "base64", "clap", "anyhow", "arrow/csv"]
test_common = []
# Experimental, unstable functionality primarily used for testing
experimental = []
Expand All @@ -85,6 +86,10 @@ required-features = ["cli"]
name = "parquet-rowcount"
required-features = ["cli"]

[[bin]]
name = "parquet-fromcsv"
required-features = ["cli"]

[[bench]]
name = "arrow_writer"
harness = false
Expand Down
61 changes: 61 additions & 0 deletions parquet/src/bin/parquet-fromcsv-help.txt
@@ -0,0 +1,61 @@
parquet 15.0.0
kazuk marked this conversation as resolved.
Show resolved Hide resolved
Apache Arrow <dev@arrow.apache.org>
Binary to convert csv to Parquet

USAGE:
parquet [OPTIONS] --schema <SCHEMA> --input-file <INPUT_FILE> --output-file <OUTPUT_FILE>

OPTIONS:
-b, --batch-size <BATCH_SIZE>
batch size

[env: PARQUET_FROM_CSV_BATCHSIZE=]
[default: 1000]

-c, --parquet-compression <PARQUET_COMPRESSION>
compression mode

[default: SNAPPY]

-d, --delimiter <DELIMITER>
field delimiter

default value: when input_format==CSV: ',' when input_format==TSV: 'TAB'

-D, --double-quote <DOUBLE_QUOTE>
double quote

-e, --escape-char <ESCAPE_CHAR>
escape charactor

-f, --input-format <INPUT_FORMAT>
input file format

[default: csv]
[possible values: csv, tsv]

-h, --has-header
has header

--help
Print help information

-i, --input-file <INPUT_FILE>
input CSV file

-o, --output-file <OUTPUT_FILE>
output Parquet file

-q, --quote-char <QUOTE_CHAR>
quate charactor

-r, --record-terminator <RECORD_TERMINATOR>
record terminator

[possible values: lf, crlf, cr]

-s, --schema <SCHEMA>
message schema for output Parquet

-V, --version
Print version information