Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DAG run state not updated while DAG is paused #16343

Merged
merged 15 commits into from Jun 17, 2021

Conversation

ephraimbuddy
Copy link
Contributor

Closes: #15439

The state of a DAG run does not update while the DAG is paused.
The tasks continue to run if the DAG run was kicked off before
the DAG was paused and eventually finish and are marked correctly.
The DAG run state does not get updated and stays in Running state until the DAG is unpaused.

This change fixes it by running a check at intervals, updating states(if possible)
of DagRuns that the tasks have finished running while the DAG is paused


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:Scheduler Scheduler or dag parsing Issues label Jun 9, 2021
@ephraimbuddy ephraimbuddy marked this pull request as ready for review June 9, 2021 11:16
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 9, 2021
@github-actions
Copy link

github-actions bot commented Jun 9, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@ephraimbuddy ephraimbuddy force-pushed the update-dagrun-state-dag-paused branch from 25af23f to cc76e09 Compare June 9, 2021 23:12
@ashb
Copy link
Member

ashb commented Jun 10, 2021

I wonder what is more efficient: doing this periodically (for paused dags, where the state is likely to never change) or expanding on the "mini scheduler run" to do a simpler version of dag_run.update_state() when the task that just finished was one of the leaf tasks in the dag.

@ephraimbuddy
Copy link
Contributor Author

I wonder what is more efficient: doing this periodically (for paused dags, where the state is likely to never change) or expanding on the "mini scheduler run" to do a simpler version of dag_run.update_state() when the task that just finished was one of the leaf tasks in the dag.

Nice but I think it may not work if the user disables mini scheduling?

@ashb
Copy link
Member

ashb commented Jun 10, 2021

Nice but I think it may not work if the user disables mini scheduling?

Yes, but we'll likely remove that setting in a version or two -- it was mostly an escape hatch in case it had un-forseen bugs.

@ephraimbuddy
Copy link
Contributor Author

Nice but I think it may not work if the user disables mini scheduling?

Yes, but we'll likely remove that setting in a version or two -- it was mostly an escape hatch in case it had un-forseen bugs.

Should I add it as a separate check outside the mini scheduling?

@ephraimbuddy ephraimbuddy force-pushed the update-dagrun-state-dag-paused branch 4 times, most recently from 1a941b1 to 5f386be Compare June 11, 2021 09:23
@ephraimbuddy ephraimbuddy reopened this Jun 11, 2021
@ephraimbuddy ephraimbuddy force-pushed the update-dagrun-state-dag-paused branch 2 times, most recently from 0fcc3f1 to 8d420f1 Compare June 14, 2021 22:08
airflow/jobs/local_task_job.py Outdated Show resolved Hide resolved
airflow/jobs/scheduler_job.py Outdated Show resolved Hide resolved
airflow/jobs/local_task_job.py Outdated Show resolved Hide resolved
airflow/jobs/local_task_job.py Outdated Show resolved Hide resolved
@ephraimbuddy ephraimbuddy force-pushed the update-dagrun-state-dag-paused branch 2 times, most recently from a06e51e to 3bf9b22 Compare June 15, 2021 17:20
@ephraimbuddy ephraimbuddy reopened this Jun 15, 2021
@ephraimbuddy ephraimbuddy force-pushed the update-dagrun-state-dag-paused branch from 3bf9b22 to 0c8d695 Compare June 15, 2021 23:32
@ephraimbuddy ephraimbuddy force-pushed the update-dagrun-state-dag-paused branch from 17b92bd to 263767a Compare June 17, 2021 12:42
@ephraimbuddy ephraimbuddy reopened this Jun 17, 2021
@ephraimbuddy ephraimbuddy reopened this Jun 17, 2021
@ephraimbuddy ephraimbuddy force-pushed the update-dagrun-state-dag-paused branch 2 times, most recently from 859f92b to 6e41893 Compare June 17, 2021 18:35
@ephraimbuddy ephraimbuddy merged commit 3834df6 into apache:main Jun 17, 2021
@ephraimbuddy ephraimbuddy deleted the update-dagrun-state-dag-paused branch June 17, 2021 23:29
@ashb ashb added this to the Airflow 2.1.1 milestone Jun 22, 2021
ashb pushed a commit that referenced this pull request Jun 22, 2021
The state of a DAG run does not update while the DAG is paused.
The tasks continue to run if the DAG run was kicked off before
the DAG was paused and eventually finish and are marked correctly.
The DAG run state does not get updated and stays in Running state until the DAG is unpaused.

This change fixes it by running a check on task exit to update state(if possible)
 of the DagRun if the task was able to finish the DagRun while the DAG is paused

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
(cherry picked from commit 3834df6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DAG run state not updated while DAG is paused
2 participants