Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update read parquet example in parquet/arrow home #2730

Merged
merged 2 commits into from Sep 15, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 10 additions & 24 deletions parquet/src/arrow/mod.rs
Expand Up @@ -66,26 +66,24 @@
//! # Example of reading parquet file into arrow record batch
//!
//! ```rust
//! use arrow::record_batch::RecordBatchReader;
//! use parquet::file::reader::{FileReader, SerializedFileReader};
//! use parquet::arrow::{ParquetFileArrowReader, ArrowReader, ProjectionMask};
//! use std::sync::Arc;
//! use std::fs::File;
//! use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
//!
//! # use std::sync::Arc;
//! # use arrow::array::Int32Array;
//! # use arrow::datatypes::{DataType, Field, Schema};
//! # use arrow::record_batch::RecordBatch;
//! # use parquet::arrow::arrow_writer::ArrowWriter;
//! #
//! # let ids = Int32Array::from(vec![1, 2, 3, 4]);
//! # let schema = Arc::new(Schema::new(vec![
//! # Field::new("id", DataType::Int32, false),
//! # Field::new("id", DataType::Int32, false),
//! # ]));
//! #
//! # // Write to a memory buffer (can also write to a File)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't actually correct, although I appreciate it was incorrect before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up, I didn't realize. I just removed it, I don't think it adds much value to say we're writing to the file here. But let me know if you've got a better idea.

//! # let file = File::create("data.parquet").unwrap();
//! #
//! # let batch =
//! # RecordBatch::try_new(Arc::clone(&schema), vec![Arc::new(ids)]).unwrap();
//! # let batch = RecordBatch::try_new(Arc::clone(&schema), vec![Arc::new(ids)]).unwrap();
//! # let batches = vec![batch];
//! #
//! # let mut writer = ArrowWriter::try_new(file, Arc::clone(&schema), None).unwrap();
Expand All @@ -97,26 +95,14 @@
//!
//! let file = File::open("data.parquet").unwrap();
//!
//! let mut arrow_reader = ParquetFileArrowReader::try_new(file).unwrap();
//! let mask = ProjectionMask::leaves(arrow_reader.parquet_schema(), [0]);
//!
//! println!("Converted arrow schema is: {}", arrow_reader.get_schema().unwrap());
//! println!("Arrow schema after projection is: {}",
//! arrow_reader.get_schema_by_columns(mask.clone()).unwrap());
//! let builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
//! println!("Converted arrow schema is: {}", builder.schema());
//!
//! let mut unprojected = arrow_reader.get_record_reader(2048).unwrap();
//! println!("Unprojected reader schema: {}", unprojected.schema());
//! let mut reader = builder.build().unwrap();
//!
//! let mut record_batch_reader = arrow_reader.get_record_reader_by_columns(mask, 2048).unwrap();
//! let record_batch = reader.next().unwrap().unwrap();
//!
//! for maybe_record_batch in record_batch_reader {
//! let record_batch = maybe_record_batch.unwrap();
//! if record_batch.num_rows() > 0 {
//! println!("Read {} records.", record_batch.num_rows());
//! } else {
//! println!("End of file!");
//! }
//!}
//! println!("Read {} records.", record_batch.num_rows());
//! ```

experimental!(mod array_reader);
Expand Down