Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermediate Relation Fails with External Materialization with Partitions #232

Open
jesseginsberg opened this issue Aug 9, 2023 · 6 comments

Comments

@jesseginsberg
Copy link

Hello, I'm running into an issue when using the external materialization with the "PARTITION_BY" option.

The dbt run goes ok through the part where it writes out the partitioned files to the file system. But it fails when it reaches the step where it creates the intermediate_relation view within DuckDB. I believe it is the section here: https://github.com/jwills/dbt-duckdb/blob/master/dbt/include/duckdb/macros/materializations/external.sql#L51-L55.

My dbt model called test_model looks something like:

{{ config(
    materialized='external', 
    location="~/dbt_duckdb/local_partitioned_out",
    options={"PARTITION_BY": "date"}
) }}

select 
  date,
  other_columns 
from {{ ref('upstream_model') }}

Looking at the logs, after dbt writes out the partitioned files externally dbt tries to create a view like this one:

create or replace view "db"."main"."test_model__dbt_int" as (
        select * from '~/dbt_duckdb/local_partitioned_out'
    );

But that CREATE VIEW query fails. I believe this intermediate relation expects to be pointing a single file. Because there are multiple files broken out into multiple directories, I think the query should instead be something like:

create or replace view "db"."main"."test_model__dbt_int" as (
        select * from read_parquet('~/dbt_duckdb/local_partitioned_out/**')
    );

Please let me know if my description is clear or if there's any additional info that would be helpful. Thanks!

@jwills
Copy link
Collaborator

jwills commented Aug 9, 2023

@jesseginsberg this might seem stupid, but does it work if you do the options as the (lowercase) partition_by instead of the uppercase PARTITION_BY?

The logic we use for constructing the external read location is defined here and it's case-sensitive in the config option name it's looking for: https://github.com/jwills/dbt-duckdb/blob/master/dbt/adapters/duckdb/impl.py#L136

@jesseginsberg
Copy link
Author

@jwills well that was an easy solve 😄. Yes, when I use partition_by instead of PARTITION_BY everything runs successfully for me.

Thanks for the quick response!

@harshil4076
Copy link

Hi!
Stumbled upon this issue when searching for partition_by. It helped me solve my issue as well.
Thanks!
Now I can create new partitioned parquet and I also want to update the parquet files as well.
I am interested in using COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_PATTERN "orders_{i}").
Basically I want to update the parquet file periodically.
does the current dbt-duckdb version support that?

@jwills
Copy link
Collaborator

jwills commented Jan 22, 2024

IIRC you can specify any set of key-value pairs that you like in the options dict and we will copy them over to the clause we append to the COPY <tbl> TO '<filepath>' ... expression; you might need to be a bit careful with the filename_pattern one tho b/c it's not in the set of known options that we quote for you automatically.

@jwills
Copy link
Collaborator

jwills commented Jan 22, 2024

Any boolean options (like overwrite_or_ignore) will need a dummy true value too-- like 1-- to work correctly. You can see the "logic" we employ for this here: https://github.com/duckdb/dbt-duckdb/blob/master/dbt/adapters/duckdb/impl.py#L108

@harshil4076
Copy link

Thanks @jwills.
options={"partition_by": "date","overwrite_or_ignore": 1} worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants