Dynamic dag structure using an external database #39294

rajrao · 2024-04-27T22:57:38Z

rajrao
Apr 27, 2024

We generate the tasks for our DAGs using an external database. This has been working for us since 2020. We have over 150 DAGs running and each one uses an "objects" table filtered by a "dag" field to determine what it needs to do:

As an example:

DAG	Object	IsActive	Batch
Dag1	User	True	1
Dag1	Student	False	2
Dag1	Courses	True	3
Dag2	Books	True	null
Dag2	Authors	True	null

Here are some examples of the DAG graphs we use:

In the above DAG, there is a loop that generates the tasks based on the db records. A "batch" field is used to determine which "batch" something is to be run in. So the number of objects in each column of the image can be dynamically changed based on new requirements.

Or this one:

In the above case, the DAG file has the list of tasks and any logic that is needed to run them, but each row, representing a separate entity is pulled from the database. So anytime we need to add or remove an object or disable one for a little while, we can do it using the database.

I was wondering what the community thought about structuring the dag using a database as shown above? The database is hit as part of the top-level code, as its used to determine what needs to be run as part of the DAG. Obviously, this is something that is not recommended in the documentation, as it would hit the database, everytime the DAG file is parsed. But given the number of DAGs and their complexity, we have not seen an issue with performance or locking against the external db we use for this purpose. We could build a complex solution that generates the PY file based on the database records and it could be done only when the record changes. But given how well our system has been working, wondering if that is needed.

Would love to get the communities thoughts on the above solution.

raphaelauv · 2024-04-30T07:41:04Z

raphaelauv
Apr 30, 2024

check https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/dynamic-task-mapping.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic dag structure using an external database #39294

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Dynamic dag structure using an external database #39294

rajrao Apr 27, 2024

Replies: 1 comment

raphaelauv Apr 30, 2024

rajrao
Apr 27, 2024

raphaelauv
Apr 30, 2024