Skip to content

Commit

Permalink
add test file for page index filter. (#25)
Browse files Browse the repository at this point in the history
* add test file for page index filter.

* add link
  • Loading branch information
Ted-Jiang committed Jul 4, 2022
1 parent b76cde4 commit aafd3fc
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 10 deletions.
22 changes: 12 additions & 10 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,18 @@

# Test data files for Parquet compatibility and regression testing

| File | Description |
|----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| delta_byte_array.parquet | string columns with DELTA_BYTE_ARRAY encoding. See [delta_byte_array.md](delta_byte_array.md) for details. |
| delta_length_byte_array.parquet | string columns with DELTA_LENGTH_BYTE_ARRAY encoding. |
| delta_binary_packed.parquet | INT32 and INT64 columns with DELTA_BINARY_PACKED encoding. See [delta_binary_packed.md](delta_binary_packed.md) for details. |
| delta_encoding_required_column.parquet | required INT32 and STRING columns with delta encoding. See [delta_encoding_required_column.md](delta_encoding_required_column.md) for details. |
| delta_encoding_optional_column.parquet | optional INT64 and STRING columns with delta encoding. See [delta_encoding_optional_column.md](delta_encoding_optional_column.md) for details. |
| nested_structs.rust.parquet | Used to test that the Rust Arrow reader can lookup the correct field from a nested struct. See [ARROW-11452](https://issues.apache.org/jira/browse/ARROW-11452) |
| data_index_bloom_encoding_stats.parquet | optional STRING column. Contains optional metadata: bloom filters, column index, offset index and encoding stats. |
|null_list.parquet | an empty list. Generated from this json `{"emptylist":[]}` and for the purposes of testing correct read/write behaviour of this base case. |
| File | Description |
|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| delta_byte_array.parquet | string columns with DELTA_BYTE_ARRAY encoding. See [delta_byte_array.md](delta_byte_array.md) for details. |
| delta_length_byte_array.parquet | string columns with DELTA_LENGTH_BYTE_ARRAY encoding. |
| delta_binary_packed.parquet | INT32 and INT64 columns with DELTA_BINARY_PACKED encoding. See [delta_binary_packed.md](delta_binary_packed.md) for details. |
| delta_encoding_required_column.parquet | required INT32 and STRING columns with delta encoding. See [delta_encoding_required_column.md](delta_encoding_required_column.md) for details. |
| delta_encoding_optional_column.parquet | optional INT64 and STRING columns with delta encoding. See [delta_encoding_optional_column.md](delta_encoding_optional_column.md) for details. |
| nested_structs.rust.parquet | Used to test that the Rust Arrow reader can lookup the correct field from a nested struct. See [ARROW-11452](https://issues.apache.org/jira/browse/ARROW-11452) |
| data_index_bloom_encoding_stats.parquet | optional STRING column. Contains optional metadata: bloom filters, column index, offset index and encoding stats. |
| null_list.parquet | an empty list. Generated from this json `{"emptylist":[]}` and for the purposes of testing correct read/write behaviour of this base case. |
| alltypes_tiny_pages.parquet | small page sizes with dictionary encoding with page index from [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |
| alltypes_tiny_pages_plain.parquet | small page sizes with plain encoding with page index [impala](https://github.com/apache/impala/tree/master/testdata/data/alltypes_tiny_pages.parquet). |

TODO: Document what each file is in the table above.

Expand Down
Binary file added data/alltypes_tiny_pages.parquet
Binary file not shown.
Binary file added data/alltypes_tiny_pages_plain.parquet
Binary file not shown.

0 comments on commit aafd3fc

Please sign in to comment.