Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tasks in an infinite slots pool were never scheduled #15247

Merged
merged 2 commits into from Jun 22, 2021

Conversation

bperson
Copy link
Contributor

@bperson bperson commented Apr 7, 2021

Should fix #14515.

I saw 3 ways of fixing it:

  1. do a DB migration to represent an infinite pool with maxint slots instead of -1.
  2. do the same thing but at runtime in slots_stats.
  3. get _executable_task_instances_to_queued to understand -1 pools, changing all code operating on pool_slots_free and open_slots # # # # #

As to why I chose 2:

  1. could be a decent alternative but it's a DB migration with all the complexities inherent to it.
  2. is relatively non-intrusive: it's almost free to do it at runtime and it keeps the change completely in the core Python part of Airflow ( in case someone out there directly creates pools by inserting them in his DB 馃槰 )
  3. has the benefit of truly handling infinite pools but I doubt anyone has reached a scale where it matters ( we're talking about a pool with 9223372036854775807 slots here on my system ), plus it increases the overall complexity of that function which is close to the core of the scheduler, we might as well try to keep it as lean as possible.

@boring-cyborg boring-cyborg bot added the area:Scheduler Scheduler or dag parsing Issues label Apr 7, 2021
@uranusjr
Copy link
Member

uranusjr commented Apr 7, 2021

If typing is applied well to the related code path, declaring total (and open since it鈥檚 calculated from it) to Optional[int] instead would be a better variant to option 3.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label May 23, 2021
@ashb ashb added this to the Airflow 2.1.1 milestone May 27, 2021
@@ -106,6 +107,8 @@ def slots_stats(

pool_rows: Iterable[Tuple[str, int]] = query.all()
for (pool_name, total_slots) in pool_rows:
if total_slots == -1:
total_slots = maxsize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one downside to this is that it would then be logged as 9223372036854775807 slots free etc.

It breaks/is lying about the type, but how about:

Suggested change
total_slots = maxsize
total_slots = float('inf') # type: ignore

Because this has all the properties we want:

  • Anything subtracted from it is still infinity
  • It is greater than any actual number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, and existing log end up pretty solid too:

[2021-06-04 13:37:28,963] {scheduler_job.py:993} INFO - Figuring out tasks to run in Pool(name=infinite_pool) with inf open slots and 1 task instances ready to be queued
[2021-06-04 13:37:28,963] {scheduler_job.py:1021} INFO - DAG SchedulerJobTest.test_infinite_pool has 0/16 running and queued tasks
[2021-06-04 13:37:28,963] {scheduler_job.py:1086} INFO - Setting the following tasks to queued state:
	<TaskInstance: SchedulerJobTest.test_infinite_pool.dummy 2016-01-01 00:00:00+00:00 [scheduled]>

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment

@uranusjr
Copy link
Member

Eh, what is the bot doing?

@potiuk potiuk reopened this Jun 11, 2021
@potiuk
Copy link
Member

potiuk commented Jun 11, 2021

Strange

@kaxil kaxil removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 11, 2021
@ashb ashb self-assigned this Jun 11, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 11, 2021
@ashb
Copy link
Member

ashb commented Jun 11, 2021

@bperson We've got a test failure in tests/models/test_pool.py::TestPool::test_infinite_slots

  E         Omitting 1 identical items, use -vv to show
  E         Differing items:
  E         {'test_pool': {'open': -1, 'queued': 1, 'running': 1, 'total': -1}} != {'test_pool': {'open': inf, 'queued': 1, 'running': 1, 'total': inf}}
  E         Use -v to get the full diff

@bperson
Copy link
Contributor Author

bperson commented Jun 12, 2021

@ashb sorry about that, it should be better now ( though Ci seems to be stuck in an endless 404-ing loop right now 馃槄 )

@ashb
Copy link
Member

ashb commented Jun 14, 2021

I have rebased to latest main, hopefully that should fix the 404

@ashb
Copy link
Member

ashb commented Jun 14, 2021

Damnit pymsql:

TypeError: unsupported operand type(s) for -: 'float' and 'decimal.Decimal'

@ashb ashb changed the title Infinite pools: Make their total_slots be maxint-like instead of -1 Fix tasks in an infinite slots pool were never scheduled Jun 22, 2021
@ashb ashb merged commit 96f7643 into apache:main Jun 22, 2021
ashb pushed a commit that referenced this pull request Jun 22, 2021
Infinite pools: Make their `total_slots` be `inf` instead of `-1`

(cherry picked from commit 96f7643)
Jorricks pushed a commit to Jorricks/airflow that referenced this pull request Jun 24, 2021
Infinite pools: Make their `total_slots` be `inf` instead of `-1`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tasks in an infinite slots pool are never scheduled
5 participants