New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid unintentional data loss when deleting DAGs #20758
Avoid unintentional data loss when deleting DAGs #20758
Conversation
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small NIT :)
Side-comment: I was almost sure that DAG_id cannot contain "." (precisely because of the subdag convention), But now I see this i not the case :). We excluded '.' for task group (for the reason task groups are also . separated) but not for the task ids:
Nice catch. |
dags_to_delete_query = session.query(DagModel.dag_id).filter( | ||
or_( | ||
DagModel.dag_id == dag_id, | ||
and_(DagModel.dag_id.like(f"{dag_id}.%"), DagModel.is_subdag), | ||
) | ||
) | ||
dags_to_delete = [dag_id for dag_id, in dags_to_delete_query] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or_(
DagModel.dag_id == dag_id,,
DagModel.root_dag_id == dag_id
might have also worked @SamWheating
(cherry picked from commit 5980d2b)
(cherry picked from commit 5980d2b)
(cherry picked from commit 5980d2b)
We encountered some data loss today due to a user deleting a DAG from the UI called
project.load
, which then deleted all of the history from other DAGs calledproject.load.bigquery
andproject.load.trino
, which also caused them to run unexpectedly due to the resetting of run history.Note - we don't use SubDAGs, we're just using
.
in the DAG ID as a separator for a hierarchical naming system.As it turns out, deleting a DAG
my_dag
will delete all of the metadata for any DAG which starts withmy_dag.
, as it is assumed that the latter are subdags of the former:This isn't always the case.
Anyways, this PR changes the delete_dag function so that it only deletes the intended DAG and DAGs starting with
<dag_id>.
which are also SubDAGs. I think that there may still be some other edge cases where DAGs can be unintentionally deleted, but this patches the most apparent case.This can all be cleaned up even more once the deprecation of SubDAGs is complete (Airflow 3?)