Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot query s3 data from datafusion-cli #4239

Closed
Jefffrey opened this issue Nov 16, 2022 · 1 comment · Fixed by #4329
Closed

Cannot query s3 data from datafusion-cli #4239

Jefffrey opened this issue Nov 16, 2022 · 1 comment · Fixed by #4329
Labels
bug Something isn't working

Comments

@Jefffrey
Copy link
Contributor

Describe the bug

When following steps from user guide for cli:

https://github.com/apache/arrow-datafusion/blob/fc669d5892954cbd2612a272314785758a7cb176/docs/source/user-guide/cli.md?plain=1#L154-L187

Cannot create a table as it states.

To Reproduce

jeffrey:~/Code/arrow-datafusion/datafusion-cli$ export AWS_ACCESS_KEY_ID=****************
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ export AWS_SECRET_ACCESS_KEY=****************
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ export AWS_REGION=ap-southeast-2
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.10s
     Running `/media/jeffrey/1tb_860evo_ssd/.cargo_target_cache/debug/datafusion-cli`
DataFusion CLI v14.0.0
❯ create external table test stored as parquet location 's3://tremendous-orange-fork/customer/part-0.parquet';
Execution("Generic S3 error: Missing region")
❯

Expected behavior

Be able to create a table from s3 and query it, per instructions in the user doc.

Additional context

User doc also states that AWS_REGION must be set as per:

https://github.com/apache/arrow-datafusion/blob/fc669d5892954cbd2612a272314785758a7cb176/docs/source/user-guide/cli.md?plain=1#L162-L164

However it seems this issue apache/arrow-rs#2795 has been resolved and the fix merged a while back, and attempting the above steps without an AWS_REGION env var set still results in same error. Unsure whether user doc is wrong or there is another bug relating to AWS_REGION?

jeffrey:~/Code/arrow-datafusion/datafusion-cli$ export AWS_ACCESS_KEY_ID=****************
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ export AWS_SECRET_ACCESS_KEY=****************
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ export AWS_REGION=
jeffrey:~/Code/arrow-datafusion/datafusion-cli$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.10s
     Running `/media/jeffrey/1tb_860evo_ssd/.cargo_target_cache/debug/datafusion-cli`
DataFusion CLI v14.0.0
❯ create external table test stored as parquet location 's3://tremendous-orange-fork/customer/part-0.parquet';
Execution("Generic S3 error: Missing region")
❯
@Jefffrey Jefffrey added the bug Something isn't working label Nov 16, 2022
@psvri psvri mentioned this issue Nov 22, 2022
@psvri
Copy link
Contributor

psvri commented Nov 22, 2022

The region is specified via AWS_DEFAULT_REGION environment variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants