Spline Support of Statment Create table as select #784

zacayd · 2024-01-28T12:08:09Z

I have a statment in a Notebook in Databricks that is

create table
  lineage_data.lineagedemo.dinner_1
AS 
SELECT
  recipe_id, concat(app," + ", main," + ",dessert)
AS
  full_menu
FROM
  lineage_data.lineagedemo.menu

the lineage is not shown
Only on insert into - it shows?
Is it by design
I use version
spark_3_3_spline_agent_bundle_2_12_2_1_0_SNAPSHOT.jar

The text was updated successfully, but these errors were encountered:

wajda · 2024-01-30T10:06:14Z

It should be supported.
Is there any error in logs?
Try "USING DELTA" (not sure if it's going to make any difference, but we use it like that).

zacayd · 2024-01-31T09:15:19Z

Hi
i have run this code

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import col

# Assuming SparkSession is already available as 'spark' in Databricks
# spark = SparkSession.builder.appName("example").getOrCreate()

# Define the schema for the table
schema = StructType([
    StructField("id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Create a DataFrame with the schema
data = [(1, "Alice", 30), (2, "Bob", 35), (3, "Charlie", 40)]
df = spark.createDataFrame(data, schema)

# Define the table name and path (replace with your desired table name and path)
silver_table_name = "silver_table_name1"
path = "/mnt/delta/" + silver_table_name

# Write the DataFrame to the table
df.write.format("delta").mode("overwrite").save(path)

# Register the table in the metastore
spark.sql(f"CREATE TABLE {silver_table_name} USING DELTA LOCATION '{path}'")

# Create a materialized view from the silver layer table
materialized_view_name = "your_materialized_view"
create_mv_sql = f"""
CREATE  VIEW {materialized_view_name}
AS SELECT * FROM {silver_table_name}
"""
spark.sql(create_mv_sql)

# Optionally, verify the contents of the table and materialized view
spark.sql(f"SELECT * FROM {silver_table_name}").show()
spark.sql(f"SELECT * FROM {materialized_view_name}").show()

and according to the execution plan i got in spline
it shows only target /mnt/delta/silver_table_name1
with columns
seems that the create view statemet is not registered in the lineage

zacayd · 2024-01-31T10:12:09Z

It should be supported. Is there any error in logs? Try "USING DELTA" (not sure if it's going to make any difference, but we use it like that).

@wajda
also when i added Using delta is didnt solved it

-- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS lineage_data.lineagedemo.Accounts1;
-- Drop the table 'V_Accounts1' if it exists
DROP TABLE IF EXISTS lineage_data.lineagedemo.V_Accounts1;

-- Creating the table 'Accounts1' using Delta format
CREATE TABLE lineage_data.lineagedemo.Accounts1(
    Id INT,
    Name STRING
) USING DELTA;

-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE TABLE lineage_data.lineagedemo.V_Accounts1
USING DELTA AS
SELECT *
FROM lineage_data.lineagedemo.Accounts1;

zacayd · 2024-01-31T11:08:50Z

@wajda if i add Using delta it works also in create as select

-- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS Accounts1;
-- Drop the table 'V_Accounts1' if it exists
DROP TABLE IF EXISTS V_Accounts1;

-- Creating the table 'Accounts1' using Delta format
CREATE TABLE Accounts1(
    Id INT,
    Name STRING
) USING DELTA;

-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE TABLE V_Accounts1
USING DELTA AS
SELECT *
FROM Accounts1;

wajda · 2024-01-31T16:38:04Z

It might be possible that CREATE VIEW is either not supported by Spline, or is not trackable at all in Spark. A more thorough investigation is needed. Try to enable DEBUG logging in Spark and see if there is a message like

XXX was not recognized as a write-command

or Write extraction failed following an object structure dump.

Another small thing to try is to enable non persistent actions capturing like this:

spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true

If you still do no see any mentioning of your "VIEW" anywhere - in logs, errors, or lineage - then it's very likely that Spark simply do not provide the information about that action at the first place. Please try and let us know what you've found.

zacayd · 2024-02-01T06:23:20Z

I will try
spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true
But how can enable debug logging on Databricks?

zacayd · 2024-02-01T06:48:36Z

I have added spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true
to the Advanced Setting
it caused additional of execution plans- but none of them showed the view and the columns
@wajda - Do you know how can i enable DEBUG logging in Databricks?

zacayd · 2024-02-01T07:24:49Z

By the way,Unity Catalog of Databricks support create view as select * from table

wajda · 2024-02-01T11:24:48Z

Do you know how can i enable DEBUG logging in Databricks?

spark.sparkContext.setLogLevel("DEBUG")

zacayd · 2024-02-01T12:24:32Z

I put on the head of my Databricks notebook

%python
spark.sparkContext.setLogLevel("debug")

%python
sc._jvm.za.co.absa.spline.harvester.SparkLineageInitializer.enableLineageTracking(spark._jsparkSession)

%SQL

-- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS Accounts1;
-- Drop the table 'V_Accounts1' if it exists

-- Creating the table 'Accounts1' using Delta format
CREATE TABLE Accounts1(
    Id INT,
    Name STRING
) USING DELTA;


-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE OR REPLACE view V_Accounts1
 AS
SELECT *
FROM Accounts1;

But dont see any error on the driver logs

uday1409 · 2024-02-25T10:57:50Z

@wajda I faced the similar issue and was able to see the plan. Although I don't have details right now to see what spline is generating for CTAS, I identified that because of below change in Spark caused this issue. Although this change is applicable starting Spark 3.3, usually databricks ports back changes to older version as well. This was true for ListQuery expression as well I fixed earlier in Spline.

Need to extract the logical plan class correctly for Spline to work

apache/spark@aa1b16b#diff-594addaa7ed3dd43fd0dad5fa81ade5ec2e570adf7c982c0570e3358a08a8e0a

wajda added bug Something isn't working dependency: Spark 3.0+ labels Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spline Support of Statment Create table as select #784

Spline Support of Statment Create table as select #784

zacayd commented Jan 28, 2024 •

edited by wajda

wajda commented Jan 30, 2024

zacayd commented Jan 31, 2024 •

edited by wajda

zacayd commented Jan 31, 2024 •

edited by wajda

zacayd commented Jan 31, 2024 •

edited by wajda

wajda commented Jan 31, 2024

zacayd commented Feb 1, 2024

zacayd commented Feb 1, 2024

zacayd commented Feb 1, 2024

wajda commented Feb 1, 2024

zacayd commented Feb 1, 2024 •

edited by wajda

uday1409 commented Feb 25, 2024

Spline Support of Statment Create table as select #784

Spline Support of Statment Create table as select #784

Comments

zacayd commented Jan 28, 2024 • edited by wajda

wajda commented Jan 30, 2024

zacayd commented Jan 31, 2024 • edited by wajda

zacayd commented Jan 31, 2024 • edited by wajda

zacayd commented Jan 31, 2024 • edited by wajda

wajda commented Jan 31, 2024

zacayd commented Feb 1, 2024

zacayd commented Feb 1, 2024

zacayd commented Feb 1, 2024

wajda commented Feb 1, 2024

zacayd commented Feb 1, 2024 • edited by wajda

uday1409 commented Feb 25, 2024

zacayd commented Jan 28, 2024 •

edited by wajda

zacayd commented Jan 31, 2024 •

edited by wajda

zacayd commented Jan 31, 2024 •

edited by wajda

zacayd commented Jan 31, 2024 •

edited by wajda

zacayd commented Feb 1, 2024 •

edited by wajda