Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Optimizer rule 'common_sub_expression_eliminate' failed #2920

Open
talagluck opened this issue Apr 19, 2024 · 1 comment
Open

Bug: Optimizer rule 'common_sub_expression_eliminate' failed #2920

talagluck opened this issue Apr 19, 2024 · 1 comment
Labels
bug 🐛 Something isn't working

Comments

@talagluck
Copy link
Contributor

talagluck commented Apr 19, 2024

Description

I'm hitting this error as part of a dbt run in GlareDB Cloud

Database Error in model nyc_sales_jan_2019_feb_2019 (models/demo/nyc_sales_jan_2019_feb_2019.sql)
  Optimizer rule 'common_sub_expression_eliminate' failed
  caused by
  Schema error

Steps to reproduce:

  1. Create initial table:
 CREATE TABLE IF NOT EXISTS dbt_nyc_sales AS 
    SELECT * FROM 
read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/nyc_sales/sale_year=2019/sale_month=1/f55363e2587849bcb25c057be706c69d-0.parquet')"
  1. Create secondary model selecting from that table:
create table staging_nyc_sales_with_prehook 
as
select
    borough,
    neighborhood,
    building_class_category,
    residential_units,
    commercial_units,
    total_units,
    land_square_feet,
    gross_square_feet,
    year_built,
    sale_price,
    sale_date,
    latitude,
    longitude,
    bin,
    bbl
from dbt_nyc_sales
  1. Create a table unioning the previous table with another parquet file:
  create  table "rough_glitter"."public"."nyc_sales_jan_2019_feb_2019__dbt_tmp"
    as
  (
SELECT * FROM "rough_glitter"."public"."staging_nyc_sales_with_prehook"
UNION
SELECT
    borough,
    neighborhood,
    building_class_category,
    residential_units,
    commercial_units,
    total_units,
    land_square_feet,
    gross_square_feet,
    year_built,
    sale_price,
    sale_date,
    latitude,
    longitude,
    bin,
    bbl
FROM read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/nyc_sales/sale_year=2019/sale_month=2/f55363e2587849bcb25c057be706c69d-0.parquet')
  );

Oddly enough, in that last query, unioning the SELECT * FROM "rough_glitter"."public"."staging_nyc_sales_with_prehook" works fine. It also works fine if you create that as a view instead of a table.

@talagluck talagluck added the bug 🐛 Something isn't working label Apr 19, 2024
@tychoish
Copy link
Collaborator

tychoish commented May 9, 2024

Should have followed up on this:

  • does this still happen?
  • is there a workaround?
  • Schema error makes me think that somehow the types in delta as we write them, or the case of the field names has changed. is there a case difference issue in our table vs the upstream type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants