Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #1992 Reimplement boundedElasticScheduler with reentrancy but less efficient work stealing #2040

Merged
merged 1 commit into from Feb 18, 2020

Conversation

simonbasle
Copy link
Member

This should fix #1973 and fix #1992

The general idea is to abandon the facade Worker and instead always submit tasks to an executor-backed worker. In order of preference, when an operator requests a Worker:

  • if thread cap not reached, create and pick a new worker
  • else if idle workers, pick an idle worker
  • else pick a busy worker

This implies a behavior under contention that is closer to parallel(), with a pool that is expected to be quite larger than the typical parallel pool. Livelocks can still happen but they seem to be far too frequent with the current implementation anyway.

The drawback is that once we get to pick a busy worker, there's no telling when its tasks (typically blocking tasks for a BoundedElasticScheduler) will finish. So even though another executor might become idle in the meantime, the operator's tasks will be pinned to the (potentially still busy) executor initially picked.

To try to counter that effect a bit, we use a priority queue for the busy executors, favoring executors that are tied to less Workers (and thus operators). We don't yet go as far as factoring in the task queue of each executor.

Finally, one noticeable change is that the second int parameter in the API, maxPendingTask, is now influencing EACH executor's queue, and is not a shared counter. It should be safe in the sense that the number set with previous version in mind is bound to be over-dimensionned for the new version, but it would be recommended for users to scale that number back.

@simonbasle
Copy link
Member Author

note: for now BoundedElasticScheduler changed to the new implementation. might make more sense to review raw file rather than diff. Kept temporarily the old implementation as BoundedFacadeElasticScheduler.

@simonbasle
Copy link
Member Author

@rstoyanchev @bsideup this should now be in final state, except for the pending removal of BoundedFacadeElasticScheduler (now dead code)

@simonbasle simonbasle marked this pull request as ready for review February 10, 2020 16:14
final long expireMillis;
@Override
public boolean isDisposed() {
return get() == -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please extract the constant (to be used in dispose())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@codecov-io
Copy link

codecov-io commented Feb 10, 2020

Codecov Report

Merging #2040 into master will decrease coverage by 0.03%.
The diff coverage is 78.78%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2040      +/-   ##
============================================
- Coverage     81.85%   81.81%   -0.04%     
+ Complexity     4033     4024       -9     
============================================
  Files           376      376              
  Lines         31070    30973      -97     
  Branches       5814     5765      -49     
============================================
- Hits          25431    25342      -89     
- Misses         4049     4058       +9     
+ Partials       1590     1573      -17
Impacted Files Coverage Δ Complexity Δ
...c/main/java/reactor/core/scheduler/Schedulers.java 89.84% <ø> (+0.78%) 90 <0> (+1) ⬆️
.../java/reactor/core/scheduler/ElasticScheduler.java 84.07% <100%> (ø) 26 <0> (ø) ⬇️
...eactor/core/scheduler/BoundedElasticScheduler.java 78.62% <78.44%> (-4.85%) 30 <24> (-9)
...r-core/src/main/java/reactor/core/Disposables.java 88.95% <80%> (ø) 21 <0> (ø) ⬇️
.../reactor/core/scheduler/ExecutorServiceWorker.java 85.71% <85.71%> (+14.28%) 8 <7> (+1) ⬆️
...ain/java/reactor/core/scheduler/SchedulerTask.java 76.78% <0%> (-5.36%) 17% <0%> (-2%)
...va/reactor/core/publisher/ParallelMergeReduce.java 69.67% <0%> (-3.28%) 3% <0%> (ø)
...in/java/reactor/core/publisher/FluxWindowWhen.java 81.25% <0%> (-2.89%) 2% <0%> (ø)
.../java/reactor/core/publisher/BlockingIterable.java 78.35% <0%> (-1.04%) 7% <0%> (ø)
...c/main/java/reactor/core/publisher/FluxReplay.java 85.15% <0%> (+0.14%) 29% <0%> (ø) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ed361f...cd2beff. Read the comment docs.

@simonbasle simonbasle added the status/on-hold This was set aside or cannot be worked on yet label Feb 13, 2020
@simonbasle simonbasle changed the title Rework bounded elastic scheduler fix #1992 Reimplement boundedElasticScheduler with reentrancy but less efficient work stealing Feb 14, 2020
ttlSeconds,
ttlSeconds,
TimeUnit.SECONDS);
evictor.scheduleAtFixedRate(boundedServices::eviction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this.boundedServices = , this.evictor = and scheduleAtFixedRate can be replaced with a single start() since they share the implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they don't really share the implementation since the BOUNDED_SERVICE would need to be initialized to SHUTDOWN first, and in constructor the looping is superfluous.

exec.shutdownNow();
}
}
//else optimistically retry
Copy link
Contributor

@bsideup bsideup Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about using continue here (and also in the next two cases)? Otherwise, if we accidentally add some logic to the end of this loop, it will get executed when inner if resolves to false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoiding that is the goal of the comment, but I'll augment the comment with "(implicit continue here)". otherwise triggers inspection "continue is unnecessary as the last statement in a loop".

Copy link
Contributor

@bsideup bsideup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonbasle I enjoyed reviewing! Very clean implementation, easy to follow! 👍

Left a few comments but overall looks good :)

@simonbasle simonbasle removed the status/on-hold This was set aside or cannot be worked on yet label Feb 17, 2020
@simonbasle
Copy link
Member Author

ok I'm now preparing this for merging, by removing the old code and squashing the commits that needs squashing (if not all), cc @bsideup

The general idea is to abandon the facade Worker and instead always
submit tasks to an executor-backed worker. In order of preference, when
an operator requests a Worker:

 - if thread cap not reached, create and pick a new worker
 - else if idle workers, pick an idle worker
 - else pick a busy worker

This implies a behavior under contention that is closer to parallel(),
but with a pool that is expected to be quite larger than the typical
parallel pool.

The drawback is that once we get to pick a busy worker, there's no
telling when its tasks (typically blocking tasks for a
BoundedElasticScheduler) will finish. So even though another executor
might become idle in the meantime, the operator's tasks will be pinned
to the (potentially still busy) executor initially picked.

To try to counter that effect a bit, we use a priority queue for the
busy executors, favoring executors that are tied to less Workers (and
thus less operators). We don't yet go as far as factoring in the task
queue of each executor.

Finally, one noticeable change is that the second int parameter in
the API, maxPendingTask, is now influencing EACH executor's queue
instead of being a shared counter. It should be safe in the sense that
the number set with previous version in mind is bound to be
over-dimensionned for the new version, but it would be recommended for
users to reconsider that number.
@simonbasle simonbasle merged commit c0a7bb8 into master Feb 18, 2020
@simonbasle simonbasle deleted the reworkBoundedElasticScheduler branch February 18, 2020 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants