Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Deadlock on refresh from DB by local task run #25266

Closed
wants to merge 1 commit into from

Commits on Jul 25, 2022

  1. Fix Deadlock on refresh from DB by local task run

    This PR attempts to fix the deadlock that occurs when task instance
    is being run in parallel to running _do_scheduling operation
    executing get_next_dagruns_to_examine.
    
    The whole scheduling is based on actually locking DagRuns scheduler
    operats on - and it basically means that state of ANY task instances
    for that DagRun should not change during the scheduling.
    
    However there are some cases where task instance is locked
    FOR UPDATE without prior locking of the DagRun table - this
    happens for example when local task job executes the task
    and runs "check_and_change_state_before_execution" method on the
    task instance it runs. There is no earlier DagRun locking
    happening and the "refresh_from_db" run with lock_for_update
    will get the lock on both TaskInstance row as well as on the
    DagRun row. The problem is this locking happens in reverse sequence
    in this case:
    
    1) get_next_dagruns_to_examine - locks DagRun first and THEN
       tries to locks some task instances for that DagRun
    
    2) "check_and_change_state_before_execution" runs effectively the
        query: select ... from task_instance join dag_run ... for update
        which FIRST locks TaskInstance and then DagRun table.
    
    This reverse sequence of locking is what causes the deadlock.
    
    The fix is to force locking the DagRun before running the task instance
    query that joins dag_run to task_instance.
    
    Fixes: apache#23361
    potiuk committed Jul 25, 2022
    Configuration menu
    Copy the full SHA
    12df02b View commit details
    Browse the repository at this point in the history