Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to duckdb database residing on S3 #249

Open
fRoDdYy opened this issue Sep 13, 2023 · 9 comments
Open

Unable to connect to duckdb database residing on S3 #249

fRoDdYy opened this issue Sep 13, 2023 · 9 comments

Comments

@fRoDdYy
Copy link

fRoDdYy commented Sep 13, 2023

Referring to the documentation when I am trying to connect to my duckdb database by attaching it

outputs:
dev:
type: duckdb
path: tmp/dbt.duckdb
extensions:
- httpfs
- parquet
settings:
s3_region: ap-south-1
s3_access_key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}"
s3_session_token: "{{ env_var('AWS_SESSION_TOKEN') }}"
attach:
- path: "s3://bucket-name/file-path/database.duckdb
read_only: true
threads: 1
external_root: s3://bucket-name/output
target: dev

I get the following error:

Runtime Error
Catalog Error: Cannot open database "s3://bucket-name/file-path/database.duckdb" in read-only mode: database does not exist

@jwills
Copy link
Collaborator

jwills commented Sep 13, 2023

Huh; the path in the attach statement doesn't match the path in the error and it's not totally obvious to me why-- will take a look.

@Mause
Copy link
Member

Mause commented Sep 13, 2023

The s3 attach via httpfs story is a bit of a strange one right now - I think you can only set auth via environment variables or query parameters

@jwills
Copy link
Collaborator

jwills commented Sep 13, 2023

ack, thank you @Mause! I swear I've made it work before, let me see if I can sort out how.

@fRoDdYy
Copy link
Author

fRoDdYy commented Sep 13, 2023

@Mause I tried it with environment variable too but the issue persists.

@jwills Yeah sorry I made that mistake while typing have corrected it though.

@Mause
Copy link
Member

Mause commented Sep 13, 2023

Maybe double check the environment variable names? https://duckdb.org/docs/extensions/httpfs#configuration-1

@NatElkins
Copy link

I am also interested in this use case. I'm trying to perform a query on a DuckDB database in S3. I'm pretty sure my env vars are correct because I'm able to run a query like:

with t as (
	SELECT *
	FROM iceberg_scan('s3a://path/to/files', allow_moved_paths = true)
)
SELECT *
from t;

without issue.

I've set my env vars like this:

SET s3_region = 'us-east-1';
SET s3_access_key_id = 'access_key_id';
SET s3_secret_access_key = 'secret_access_key';

and I've also executed the following:

INSTALL httpfs;
INSTALL iceberg;

LOAD httpfs;
LOAD iceberg;

(Iceberg not related directly to this problem, just noting it).

I also get the error:

Catalog Error: Cannot open database "s3://path/to/test.duckdb" in read-only mode: database does not exist

I know my repro isn't using dbt-duckdb, but I'm also interested in this use case with dbt-duckdb (was just testing with SQL to verify behavior outside of dbt-duckdb).

@jwills
Copy link
Collaborator

jwills commented Jan 31, 2024

Ah appreciate that @NatElkins -- I wonder if it works if you use the fsspec stuff? So as to treat S3 like a filesystem as opposed to trying to use the httpfs route?

@NatElkins
Copy link

NatElkins commented Feb 1, 2024

@jwills It doesn't seem to work, although I may be doing something wrong. I have the creds defined in my ~/.aws/credentials file.

import duckdb
from fsspec import filesystem

duckdb.register_filesystem(filesystem('s3', anon=False))
duckdb.connect("s3://bucket/path/to/test.duckdb'")
r1 = duckdb.sql("select * from test")
print(r1)

The error I get is:

duckdb.connect("s3://bucket/path/to/test.duckdb")
duckdb.duckdb.IOException: IO Error: Cannot open file "/Users/nathanielelkins/Projects/dbt_test/s3://bucket/path/to/test.duckdb": No such file or directory

@jwills
Copy link
Collaborator

jwills commented Feb 1, 2024

Yep, right there with you-- I cannot figure out how to get this to work right now. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants