Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Spline support lineage for AWS glue Spark dynamic frames? #786

Closed
rushabh1995 opened this issue Feb 5, 2024 · 1 comment
Closed

Comments

@rushabh1995
Copy link

rushabh1995 commented Feb 5, 2024

Hello everyone,
Using spline JAR (spark-3.3-spline-agent-bundle_2.12 JAR 2.0.0), I'm attempting to extract lineage from Glue jobs; however, this only functions with spark DataFrame and not with glue dynamic frame.
Is there any functionality in the Spline JAR or anything else that will help identify the Glue DynamicFrame's lineage?

For our UseCase, we are currently utilizing Glue 4.0.

Code:

from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.appName("CSV to DynamicFrame").getOrCreate()

# Read the CSV file into a DataFrame
df = spark.read.format("csv").option("header", "true").load("s3://test/Employee/london_emp.csv")

# Perform transformations on the DataFrame if needed
df_transformed = df.withColumn("salary", df["salary"] * 1.10)

# Create a GlueContext
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)

# Convert the Spark DataFrame to a DynamicFrame
dynamic_frame = DynamicFrame.fromDF(df, glueContext, "dynamic_frame")

# Write the DynamicFrame to S3
glueContext.write_dynamic_frame.from_options(
    frame=dynamic_frame,
    connection_type="s3",
    connection_options={"path": "s3://test/netflix"},
    format="parquet"
)
@wajda
Copy link
Contributor

wajda commented Feb 6, 2024

closing as duplicate of #781

@wajda wajda closed this as completed Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants