Add disposeGracefully method to Scheduler #3089

chemicL · 2022-06-23T11:12:41Z

Currently, all Schedulers forcefully shutdown the underlying ExecutorServices by calling shutdownNow() method. In some scenarios, that is undesired as it does not allow proper cleanup.
Schedulers should allow shutting down by not accepting new work, but giving the currently executing tasks a chance to finish without interruption. A Mono<Void> disposeGracefully() method has been added for that purpose. Upon subscription, it calls shutdown() instead of shutdownNow() and creates a background task that does awaitTermination() to complete the returned Mono. It can be combined with timeout() and retry() operators in realistic scenarios.

Some `Disposable`s should be disposed with a chance to clean up the underlying resources. At the same time it is desired to coordinate logic that depends on successful disposal. Specifically, instances of `Scheduler` should allow shutting down by not accepting new work, but giving the currently executing tasks to finish without interruption. Therefore, a `Disposable.Graceful` interface has been added, that provides the means to do a timely cleanup and observing the result via a `Mono<Void> disposeGracefully(Duration)` method.

simonbasle

this is shaping up good, but the review triggers a few thoughts on my part and surface additional corner cases 🤔

reactor-core/src/main/java/reactor/core/scheduler/Schedulers.java

simonbasle · 2022-06-23T14:10:10Z

reactor-core/src/main/java/reactor/core/Disposable.java

@@ -165,4 +167,11 @@ default boolean addAll(Collection<? extends Disposable> ds) {
 		 */
 		int size();
 	}
+
+	// TODO(dj): add javadoc


the semantics of disposeGracefully may vary widely, so I would make the documented contract say that explicitly (eg. "each class implementing this trait should define how subsequent calls behave during the grace period and after it")

one limitation I'm thinking of with this API is that most of the time the underlying resource(s) being disposed gracefully will be atomically swapped out. Which means that even if one re-subscribes to the Mono, including with onErrorResume() or retry(), the underlying resources won't be reachable anymore.

thus, there will be no way of composing operators to fall back to a "hard" shutdown once the graceful shutdown is initiated.

I'm thinking this is fine if documented. The recommendation for implementors should probably be to trigger a hard dispose at the end of the gracePeriod THEN propagate a TimeoutException, noting that it only serves as a warning / logging but cannot be recovered.

We discussed the possible contracts here with @OlegDokuka. The approach with forceful shutdown before propagating the TimeoutException and non-recoverable errors might be confusing to users if they don't read the specific Scheduler documentation, but it has some advantages implementation wise.
Another approach could be to propagate a retry-able error and allow re-initiating the shutdown() + awaitTermination(...) procedure, while also allowing for an explicit final call to explicit shutdownNow() when desired. I'll go back to the original issue and ask for opinion from the user's perspective to guide the design.

My personal opinion is that once we fix specific behavior (e.g. call shutdownNow() if a timeout or InteruptedException) then we probably will end up with everyone doing scheduler.disposeGracefully(Duration.ofHours(9999999)).subscribe() or then complaining that they did not wont to have shutdownNow called but rather retry later

Another thought on TimoutException - any exception is useless if we can not do anything useful after that. I'm not sure that logging such an event makes any sense. This exception is just a fact that we forced shutdown process so a user just has to take it. Also, taking into account the impl details - all other active subscribers are going to get the same notification but the other late subscriber will not get it, then it is going to be too confusing so even having it documented will not resolve this confusion.

My personal recommendation is to prefer flexibility over fixed behavior. One can always write something like the following to mimic what we can hardcode

scheduler.disposeGracefully(Duration.ofMillis(100)) .retryWhen(Retry.backoff(5, Duration.ofMillis(50))) .onErrorResume(e -> Mono.fromRunnable(scheduler::dispose));

reactor-core/src/main/java/reactor/core/scheduler/BoundedElasticScheduler.java

reactor-core/src/main/java/reactor/core/scheduler/Schedulers.java

simonbasle · 2022-06-23T15:40:04Z

@chemicL what do you think of this piece of code to shutdown multiple executors at once, try to await as close to the grace period as possible while still only needing one thread, and finally shutdownNow in case the whole thing takes more than gracePeriod?

	//this supposes that we somehow can get all the executors and swap them with an empty array
	//with more advanced schedulers like BoundedElasticScheduler, we might not get an ExecutorService array
	//but an array of another resource (like BoundedState[]), hence the Function
	static <RES> void shutdownAndAwait(final RES[] resources, Function<RES, ExecutorService> serviceExtractor, Duration gracePeriod, Sinks.Empty<Void> disposeNotifier) {
		for (int i = 0; i < resources.length; i++) {
			ExecutorService service = serviceExtractor.apply(resources[i]);
			service.shutdown();
		}

		//TODO: use a configurable separate pool?
		final ExecutorService service = Executors.newSingleThreadExecutor();
		service.submit(() -> {
			long nanoStart = System.nanoTime();
			long nanoGraceRemaining = gracePeriod.toNanos();

			boolean allAwaited = true;
			//wait for one executor at a time
			int index = 0;
			while (index < resources.length) {
				ExecutorService toAwait = serviceExtractor.apply(resources[index]);;
				//short case: the current executor has already terminated
				if (toAwait.isTerminated()) {
					index++;
					continue;
				}

				//we're inspecting the next executor and giving it nanoGraceRemaining ns to terminate gracefully
				try {
					if (!toAwait.awaitTermination(nanoGraceRemaining, TimeUnit.NANOSECONDS)) {
						//if it didn't terminate gracefully, the whole graceful operation can be considered a failure
						allAwaited = false;
						break;
					}
					else {
						//update the nanoGraceRemaining so that the global operation is within gracePeriod bounds
						long oldStart = nanoStart;
						nanoStart = System.nanoTime();
						nanoGraceRemaining = Math.max(0, nanoStart - oldStart);
						index++;
					}
				}
				catch (InterruptedException e) {
					allAwaited = false;
					break;
				}
			}
			if (allAwaited) {
				disposeNotifier.tryEmitEmpty();
			}
			else {
				for (int i = 0; i < resources.length; i++) {
					ExecutorService executorService = serviceExtractor.apply(resources[i]);
					executorService.shutdownNow();
				}
				disposeNotifier.tryEmitError(new TimeoutException("Scheduler didn't shutdown gracefully in time (" + gracePeriod + "), used shutdownNow"));
			}
		});
	}

chemicL · 2022-06-27T07:05:27Z

@chemicL what do you think of this piece of code to shutdown multiple executors at once, try to await as close to the grace period as possible while still only needing one thread, and finally shutdownNow in case the whole thing takes more than gracePeriod?
(...)

Yep, that's a great optimization, thanks for the suggestion.

chemicL · 2022-07-13T16:04:35Z

reactor-core/src/test/java/reactor/core/scheduler/SchedulersTest.java

@@ -734,34 +734,6 @@ public void immediateTaskIsExecuted() throws Exception {
 		assertThat((end - start) >= 1000).as("Timeout too long").isTrue();
 	}

-	@Test
-	public void immediateTaskIsSkippedIfDisposeRightAfter() throws Exception {


This is no longer the case. It still could race, but now there's a few more instructions in dispose() so the task gets aborted. IMO it's not a feature worth testing - the test was introduced in 6f3383d but one should not rely on a race to dispose tasks.

OlegDokuka

Nice progress overall. Left my comments. Also, there are a set of general polishing points:

Let's use plain volatile access (e.g. this.state instead of STATE.get(this)) plain access is the fastest approach and we use it consistently through the codebase. AtomicXXXFieldUpdate#get is an option to get value when plain access is impossible (e.g. shared utils function i.e Operators#requested)
Let's make sure imports are not collapsed
ShedulerState#terminated is not making useful of the old state except bounded elastic, thus lets avoid terminated(old) and use static final TERMINATED_STATE = new SchedulerState(dead_executor_service, Mono.empty()) where possible ;

reactor-core/src/main/java/reactor/core/scheduler/SchedulerState.java

reactor-core/src/main/java/reactor/core/scheduler/SingleScheduler.java

reactor-core/src/main/java/reactor/core/scheduler/BoundedElasticScheduler.java

chemicL · 2022-08-02T17:30:48Z

The latest changes include improvements for avoiding looping in the dispose/disposeGracefully/start methods as suggested by @OlegDokuka. This was only possible by revisiting BoundedElasticScheduler's inner workings and avoiding atomic replacing of underlying BoundedStates by a tombstone. Instead, an encapsulating class was introduced for state management of BoundedServices, which in turn simplified the state transitions in BoundedElasticScheduler itself with regards to the generic SchedulerState. As convoluted as it sounds, the latest set of changes also incorporates a fix for a leakage of BoundedState which would not be shutdown if dispose happened while an ExecutorService was being picked.

…idleQueue

…y, added simple validation for sequential multiple disposeGracefully

OlegDokuka

looks good overall with few minor comments

reactor-core/src/jcstress/java/reactor/core/scheduler/RacingDisposeGracefullyStressTest.java

reactor-core/src/jcstress/java/reactor/core/scheduler/SchedulersStressTest.java

reactor-core/src/main/java/reactor/core/Disposable.java

reactor-core/src/main/java/reactor/core/scheduler/SingleScheduler.java

reactor-core/src/main/java/reactor/core/scheduler/BoundedElasticScheduler.java

simonbasle · 2022-08-11T15:21:19Z

great job @chemicL ! finally approved and ready to merge 😄

now the only remaining step is to try to summarize the design of the change for the commit message 📖
feel free to ping me if you want me to also review that, or if you need help with merging / forward-merging.

reactorbot · 2022-08-16T08:06:35Z

@chemicL this PR seems to have been merged on a maintenance branch, please ensure the change is merge-forwarded to intermediate maintenance branches and up to main 🙇

chemicL self-assigned this Jun 23, 2022

simonbasle linked an issue Jun 23, 2022 that may be closed by this pull request

Use shutdown() instead of shutdownNow() when BoundedElasticScheduler call dispose. #3068

Closed

simonbasle reviewed Jun 23, 2022

View reviewed changes

Improvements WIP

6fbd57c

chemicL mentioned this pull request Jun 28, 2022

Use shutdown() instead of shutdownNow() when BoundedElasticScheduler call dispose. #3068

Closed

chemicL added 4 commits July 5, 2022 16:48

Merge branch '3.4.x' into 3068-schedulerGracefulClose

a4a5b0e

WIP

13c020f

WIP

92f18b8

WIP

9026186

chemicL force-pushed the 3068-schedulerGracefulClose branch from 22e8b1d to 9026186 Compare July 13, 2022 10:09

chemicL added 3 commits July 13, 2022 12:10

Removed tests that race with task execution

0084749

Merge branch '3.4.x' into 3068-schedulerGracefulClose

0f2684c

BoundedElasticScheduler disposeGracefully rework

3ed8ad3

chemicL commented Jul 13, 2022

View reviewed changes

Optimizing BoundedElasticScheduler disposal logic

d1af720

OlegDokuka suggested changes Jul 18, 2022

View reviewed changes

chemicL added 10 commits July 19, 2022 00:27

WIP more concurrency tests, fixes

0e801f0

Migrated BoundedElasticScheduler JCStress tests to RaceTestUtils

d10d98a

plain volatile access

52f1cf0

Using for loops instead of getAndUpdate

b0fc331

lazy set

5703eb8

Tests improvements

8b4bc63

Generic SchedulerState

7a0452f

Schedulers imports - avoid wildcards

3dc8dca

Simplify start flow

8399b5b

Merge branch '3.4.x' into 3068-schedulerGracefulClose

9f5a7eb

chemicL marked this pull request as ready for review July 28, 2022 16:15

chemicL added 3 commits August 2, 2022 18:03

Simplified, preventing leaks

298c051

Adjusting remaining schedulers

5e72d2e

Removed dependency on BoundedState counter for state management

255878f

Ensured atomicity in BoundedServices.dispose() and properly draining …

cbe6663

…idleQueue

chemicL changed the title ~~[WIP] Add Disposable.Graceful interface~~ Add Disposable.Graceful interface and make Scheduler extend it Aug 3, 2022

chemicL requested review from OlegDokuka and simonbasle August 3, 2022 14:48

simonbasle approved these changes Aug 4, 2022

View reviewed changes

chemicL added 5 commits August 10, 2022 11:03

Moved Disposable.Graceful to Scheduler and removed gracePeriod param

6690a91

Removed exceptions from start()

fcc5079

Merge branch '3.4.x' into 3068-schedulerGracefulClose

162cf2a

JCStress tests rework to avoid time validation, just state consistenc…

709f385

…y, added simple validation for sequential multiple disposeGracefully

Add exclusion for japicmp

cb8373c

chemicL changed the title ~~Add Disposable.Graceful interface and make Scheduler extend it~~ Add disposeGracefully method to Scheduler Aug 10, 2022

Moved more thorough dispose validation to unit tests

055f72c

OlegDokuka approved these changes Aug 11, 2022

View reviewed changes

unused imports and copyright

c0c1eab

simonbasle approved these changes Aug 11, 2022

View reviewed changes

Merge branch '3.4.x' into 3068-schedulerGracefulClose

c4a5a99

chemicL merged commit 4768c43 into 3.4.x Aug 16, 2022

chemicL deleted the 3068-schedulerGracefulClose branch August 16, 2022 08:06

chemicL added a commit that referenced this pull request Aug 16, 2022

Merge #3089 into 3.5.0-M6

c747f80

chemicL mentioned this pull request Aug 29, 2022

API for graceful disposal in Disposable #3173

Open

chemicL added the type/enhancement A general enhancement label Sep 13, 2022

OlegDokuka added this to the 3.4.23 milestone Nov 8, 2022

chibenwa mentioned this pull request Mar 13, 2023

JAMES-3891 Graceful shutdown for queue consumers [3.7.x] apache/james-project#1479

Merged

renovate bot mentioned this pull request May 29, 2023

Update dependency io.projectreactor:reactor-test to v3.6.6 vicboma1/Microservices-SpringBoot-ReactorCore#9

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add disposeGracefully method to Scheduler #3089

Add disposeGracefully method to Scheduler #3089

chemicL commented Jun 23, 2022 •

edited

simonbasle left a comment

simonbasle Jun 23, 2022

simonbasle Jun 23, 2022

chemicL Jun 28, 2022 •

edited

OlegDokuka Jun 28, 2022 •

edited

simonbasle commented Jun 23, 2022

chemicL commented Jun 27, 2022

chemicL Jul 13, 2022

OlegDokuka left a comment

chemicL commented Aug 2, 2022

OlegDokuka left a comment

simonbasle commented Aug 11, 2022 •

edited

reactorbot commented Aug 16, 2022

Add disposeGracefully method to Scheduler #3089

Add disposeGracefully method to Scheduler #3089

Conversation

chemicL commented Jun 23, 2022 • edited

simonbasle left a comment

Choose a reason for hiding this comment

simonbasle Jun 23, 2022

Choose a reason for hiding this comment

simonbasle Jun 23, 2022

Choose a reason for hiding this comment

chemicL Jun 28, 2022 • edited

Choose a reason for hiding this comment

OlegDokuka Jun 28, 2022 • edited

Choose a reason for hiding this comment

simonbasle commented Jun 23, 2022

chemicL commented Jun 27, 2022

chemicL Jul 13, 2022

Choose a reason for hiding this comment

OlegDokuka left a comment

Choose a reason for hiding this comment

chemicL commented Aug 2, 2022

OlegDokuka left a comment

Choose a reason for hiding this comment

simonbasle commented Aug 11, 2022 • edited

reactorbot commented Aug 16, 2022

chemicL commented Jun 23, 2022 •

edited

chemicL Jun 28, 2022 •

edited

OlegDokuka Jun 28, 2022 •

edited

simonbasle commented Aug 11, 2022 •

edited