Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue tasks with higher priority and earlier execution_date first. #15210

Merged
merged 7 commits into from Jun 14, 2021
Merged

Queue tasks with higher priority and earlier execution_date first. #15210

merged 7 commits into from Jun 14, 2021

Conversation

ginevragaudioso
Copy link
Contributor

See issue #15171.

I tested the query on our airflow instance and it correctly sorts the results.

closes: #15171
related: #15171

@boring-cyborg boring-cyborg bot added the area:Scheduler Scheduler or dag parsing Issues label Apr 5, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 5, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@github-actions
Copy link

github-actions bot commented Apr 5, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Apr 5, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Apr 5, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@ginevragaudioso ginevragaudioso changed the title AIRFLOW-15171 order query to find out tasks to queue order query to find out tasks to queue Apr 6, 2021
@kaxil
Copy link
Member

kaxil commented Apr 6, 2021

Please rebase on latest master, that should fix the failing error

@kaxil
Copy link
Member

kaxil commented Apr 6, 2021

I just pushed, should work now

@ginevragaudioso
Copy link
Contributor Author

Thanks @kaxil for fixing the build. Anything else I should do here?

@kaxil
Copy link
Member

kaxil commented Apr 15, 2021

cc @ashb

@ginevragaudioso
Copy link
Contributor Author

@kaxil @ashb would it be possible to have this fix go in 2.0.2? Seems like an easy fix that solves an actual issue, but I don't know what's in the roadmap.

@ashb
Copy link
Member

ashb commented Apr 20, 2021

@ginevragaudioso Sorry, was too late (even yesterday) as the RC was already being voted upon.

@ashb ashb changed the title order query to find out tasks to queue Queue tasks with higher priority and earlier execution_date first. Apr 20, 2021
@ashb ashb added this to the Airflow 2.0.3 milestone Apr 20, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Apr 20, 2021
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests need expanding -- the TIs it create have the same priority and execution_date, so we aren't actually asserting that the TIs are sorted correctly.

@ginevragaudioso
Copy link
Contributor Author

ginevragaudioso commented Apr 27, 2021

The tests need expanding -- the TIs it create have the same priority and execution_date, so we aren't actually asserting that the TIs are sorted correctly.

The TIs created do not have the same execution date, unless I am missing something.

        dr1 = dag_1.create_dagrun(
             run_type=DagRunType.SCHEDULED,
             execution_date=DEFAULT_DATE + timedelta(hours=1),
             state=State.RUNNING,
         )
         dr2 = dag_2.create_dagrun(
             run_type=DagRunType.SCHEDULED,
             execution_date=DEFAULT_DATE,
             state=State.RUNNING,
         )

        tis = [
             TaskInstance(dag1_task, dr1.execution_date),   # THIS IS DEFAULT_DATE + timedelta(hours=1)  (later)
             TaskInstance(dag2_task, dr2.execution_date),   # THIS IS DEFAULT_DATE                       (earlier)
         ]

So the test is testing that we pick the one with the earliest execution date even if it is alphabetically later (which is exactly the bug being fixed).

@uranusjr
Copy link
Member

uranusjr commented Apr 28, 2021

I think what Ash meant was the currently available tests either has the same execution date or the same priority, and need to be extended to cover more combination of values.

@ginevragaudioso
Copy link
Contributor Author

The tests need expanding -- the TIs it create have the same priority and execution_date, so we aren't actually asserting that the TIs are sorted correctly.

@ashb thanks for the feedback, I added two more tests, one for priority and one for both.

@ashb ashb self-assigned this May 4, 2021
@ashb ashb removed this from the Airflow 2.0.3 milestone May 7, 2021
@ashb ashb added this to the Airflow 2.1.1 milestone May 7, 2021
@jhtimmins
Copy link
Contributor

@ashb bumping you to review the updated tests you requested here #15210 (review)

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changes that I can do via suggestion.

tests/jobs/test_scheduler_job.py Outdated Show resolved Hide resolved
tests/jobs/test_scheduler_job.py Outdated Show resolved Hide resolved
tests/jobs/test_scheduler_job.py Outdated Show resolved Hide resolved
tests/jobs/test_scheduler_job.py Outdated Show resolved Hide resolved
tests/jobs/test_scheduler_job.py Outdated Show resolved Hide resolved
@ashb ashb merged commit 943292b into apache:main Jun 14, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 14, 2021

Awesome work, congrats on your first merged pull request!

@kaxil
Copy link
Member

kaxil commented Jun 14, 2021

Well done @ginevragaudioso 👏

ashb pushed a commit that referenced this pull request Jun 22, 2021
…15210)

Co-authored-by: Ginevra Gaudioso <ggaudioso@vectra.ai>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
(cherry picked from commit 943292b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scheduler does not apply ordering when querying which task instances to queue
6 participants