Support of setting the arangoDB name on the configuration #772

zacayd · 2023-12-05T07:06:11Z

Hi
I am using spline to capture lineage from Databricks notebooks
I put on the cluseter - on the advanced settings

spark.spline.mode ENABLED
spark.spline.lineageDispatcher.http.producer.url http://10.0.19.4:8080/producer
spark.spline.lineageDispatcher http

since i have several customers- i dont want to keep the data of all of them on the same arangoDB
so I want a way that the response will be kept on a db per customer.

can we send also the arangoDb name as a parameter so the execution plan lineage data will be kept on a different db
for each cluster i use

thanks in advance

The text was updated successfully, but these errors were encountered:

wajda · 2023-12-05T16:59:59Z

No, this isn't possible. The database is an internal part of the system and is not something you can easily select on a request basis.

My recommendation for your use-case would be to simply augment your execution plan and event objects with the DBR cluster name stored as an extra parameter, or a tag, and filter the stuff on the UI based on that (the feature beta is available in the develop version of the server and the UI).

Alternatively, you may augment the URIs for the input/output sources to include the cluster name as a part of the name. That is another way to logically separate the lineage data.

If you absolutely want to use different DBs then you can run separate Spline instances, put a custom proxy gateway in front of the Spline Producer REST API (or implement a custom LineageDispatcher wrapper) and route your requests to different Spline instances based on your custom conditions.

zacayd · 2023-12-05T17:28:43Z

About
DBR cluster name stored as an extra parameter, or a tag, and filter the stuff on the UI based on that (the feature beta is available in the develop version of the server and the UI).

Do you mean that the name of the cluster is on the execution plan?

zacayd · 2023-12-05T17:31:25Z

Does the feature Beta is available as a maven in the Databricks?

wajda · 2023-12-06T00:47:17Z

No. You need to build and install from the laters development branch.

zacayd · 2023-12-06T07:36:53Z

Any chance that it will be on the cloud of Databricks soon?
since i have trouble to build and install it

wajda · 2023-12-06T15:06:39Z

no ETA unfortunately. The team has no capacity and the business priorities changed. So the project is on hold at the moment.

zacayd · 2024-01-07T07:59:32Z

Hi
I succeeded to compile the project and create a Jar and load via the DBFS
But seems that when i run the Notebook - i get lineage data but the info of the notebook is missing
i took the branch of develop
https://github.com/AbsaOSS/spline-spark-agent/tree/develop
can you advise?

AbsaOSS deleted a comment from zacayd Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of setting the arangoDB name on the configuration #772

Support of setting the arangoDB name on the configuration #772

zacayd commented Dec 5, 2023 •

edited by wajda

wajda commented Dec 5, 2023 •

edited

zacayd commented Dec 5, 2023

zacayd commented Dec 5, 2023

wajda commented Dec 6, 2023

zacayd commented Dec 6, 2023

wajda commented Dec 6, 2023

zacayd commented Jan 7, 2024

Support of setting the arangoDB name on the configuration #772

Support of setting the arangoDB name on the configuration #772

Comments

zacayd commented Dec 5, 2023 • edited by wajda

wajda commented Dec 5, 2023 • edited

zacayd commented Dec 5, 2023

zacayd commented Dec 5, 2023

wajda commented Dec 6, 2023

zacayd commented Dec 6, 2023

wajda commented Dec 6, 2023

zacayd commented Jan 7, 2024

zacayd commented Dec 5, 2023 •

edited by wajda

wajda commented Dec 5, 2023 •

edited