Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot query some parquet files in S3, but they work locally #3633

Closed
andygrove opened this issue Sep 28, 2022 · 3 comments
Closed

Cannot query some parquet files in S3, but they work locally #3633

andygrove opened this issue Sep 28, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Member

Describe the bug
I am trying to query parquet files in S3 from the CLI. Some work, and some do not.

To Reproduce

DataFusion CLI v12.0.0
❯ create external table test stored as parquet location 's3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet';
ObjectStore(Generic { store: "S3", source: MissingLastModified })

However, if I download the file locally it works.

$ aws s3 cp "s3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet" /tmp/yellow_tripdata_2022-06.parquet
download: s3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet to ../../../../../../tmp/yellow_tripdata_2022-06.parquet
ataFusion CLI v12.0.0
❯ create external table test stored as parquet location '/tmp/yellow_tripdata_2022-06.parquet';
0 rows in set. Query took 0.006 seconds.
❯ select * from test limit 10;
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| 1        | 2022-06-01 00:25:41  | 2022-06-01 00:48:22   | 1               | 11            | 1          | N                  | 70           | 48           | 1            | 32          | 3     | 0.5     | 2          | 6.55         | 0.3                   | 44.35        | 2.5                  | 0           |
| 1        | 2022-06-01 00:44:40  | 2022-06-01 01:01:48   | 1               | 4.2           | 1          | N                  | 170          | 226          | 1            | 14          | 3     | 0.5     | 0          | 0            | 0.3                   | 17.8         | 2.5                  | 0           |
| 2        | 2022-06-01 00:23:07  | 2022-06-01 00:39:50   | 1               | 9.49          | 1          | N                  | 264          | 113          | 1            | 26          | 0.5   | 0.5     | 5          | 6.55         | 0.3                   | 42.6         | 2.5                  | 1.25        |
| 1        | 2022-06-01 00:25:53  | 2022-06-01 00:57:06   | 2               | 12.1          | 1          | N                  | 132          | 17           | 2            | 37          | 1.75  | 0.5     | 0          | 0            | 0.3                   | 39.55        | 0                    | 1.25        |
| 1        | 2022-06-01 00:23:58  | 2022-06-01 00:33:43   | 0               | 1.8           | 1          | N                  | 140          | 163          | 1            | 9           | 3     | 0.5     | 2.55       | 0            | 0.3                   | 15.35        | 2.5                  | 0           |
| 2        | 2022-06-01 00:01:27  | 2022-06-01 00:10:53   | 1               | 2.02          | 1          | N                  | 148          | 158          | 1            | 9           | 0.5   | 0.5     | 0.64       | 0            | 0.3                   | 13.44        | 2.5                  | 0           |
| 2        | 2022-06-01 00:16:25  | 2022-06-01 00:40:45   | 1               | 8.08          | 1          | N                  | 158          | 116          | 1            | 26.5        | 0.5   | 0.5     | 7.58       | 0            | 0.3                   | 37.88        | 2.5                  | 0           |
| 1        | 2022-06-01 00:11:08  | 2022-06-01 00:27:02   | 1               | 4.3           | 1          | N                  | 246          | 262          | 1            | 15          | 3     | 0.5     | 3.75       | 0            | 0.3                   | 22.55        | 2.5                  | 0           |
| 2        | 2022-06-01 00:21:42  | 2022-06-01 00:42:01   | 1               | 8.78          | 1          | N                  | 197          | 191          | 1            | 26.5        | 0.5   | 0.5     | 5.56       | 0            | 0.3                   | 33.36        | 0                    | 0           |
| 2        | 2022-06-01 00:23:05  | 2022-06-01 00:30:45   | 1               | 1.76          | 1          | N                  | 48           | 186          | 1            | 7.5         | 0.5   | 0.5     | 2.26       | 0            | 0.3                   | 13.56        | 2.5                  | 0           |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
10 rows in set. Query took 1.792 seconds.

Expected behavior
Should work

Additional context
None

@andygrove
Copy link
Member Author

If I move the file into my own bucket then I can query it, so this seems to be an issue with authentication.

@andygrove
Copy link
Member Author

root cause is apache/arrow-rs#2795

@andygrove
Copy link
Member Author

This was resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant