Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverting custom thread pool from #53 in watch mode #81

Merged
merged 4 commits into from Nov 23, 2021

Conversation

jglick
Copy link
Member

@jglick jglick commented Oct 19, 2018

Observed to cause anomalous behavior during a RestartableJenkinsRule test in the ant plugin after updating dependencies in jenkinsci/ant-plugin#32. I think this is because the thread pool is not getting shut down, so two copies of the build get loaded after restart and mayhem ensues. Most likely this would not happen in production systems, which should be throwing out the plugin class loaders across restarts. Anyway as of #63 we make far fewer check calls, so the rationale for avoiding the shared Timer in #53 is obsolete.

…TaskStep.Execution.check, since it has often been observed to block for a while."

This reverts commit c9c1364.
dwnusbaum
dwnusbaum previously approved these changes Oct 22, 2018
Copy link
Member

@svanoort svanoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have really mixed feelings about this change, because:

  1. We have already hit issues in the past with overloading the Timer thread pool from Pipeline and blocking other activities as a result
  2. This thread pool has timeouts for threads so it can shrink if not used -- if we find we need to increase the Timer thread pool in Core to deal with extra load, it is unfortunately not so configured (perhaps a mistake on our part). I actually suspect that the Timer pool probably would benefit from a keep-alive time and allowing the pool to shrink and grow, since it contributes a fairly large static memory footprint (assuming 1 MB stack size and 10 threads, 10 MB even if only 1-2 tasks at a time are usually running). This may sound small, but we're increasingly trying to tune Jenkins for one-shot or instance-per-team use and it adds up. The cost is sometimes creating new threads if the pool size has shrunk too small (maybe use a system property for minimum size).
  3. It seems like this could be trivially (and more correctly) handled via a Terminator or explicit shutdown hook.
  4. It's not clear that the original issue is confirmed to come from this cause in the first place.

@jglick
Copy link
Member Author

jglick commented Oct 24, 2018

We have already hit issues in the past with overloading the Timer thread pool from Pipeline

Because it was being used for purposes which now it is not.

  1. It's not clear that the original issue is confirmed to come from this cause in the first place.

It is pretty clear to me. The test failed until I introduced this fix, then it passed.

@jglick
Copy link
Member Author

jglick commented Oct 24, 2018

it is unfortunately not so configured (perhaps a mistake on our part)

Possibly. I seem to recall issues enabling that. Could be revisited.

assuming 1 MB stack size and 10 threads, 10 MB even if only 1-2 tasks at a time are usually running

Exactly why it is undesirable to introduce additional thread pools for plugin use when they are not strictly necessary: reusing an existing pool minimizes overhead.

@jglick
Copy link
Member Author

jglick commented Oct 24, 2018

this could be trivially (and more correctly) handled via a Terminator or explicit shutdown hook

I am willing to do that if it helps get jenkinsci/ant-plugin#32 out the door, though I still think this PR is preferable, at least once #84 is reverted.

@jglick jglick added the on-hold label Oct 24, 2018
@jglick
Copy link
Member Author

jglick commented Oct 24, 2018

On hold because of #84.

@svanoort
Copy link
Member

It is pretty clear to me. The test failed until I introduced this fix, then it passed.

Ah, okay, well that would have been good to know in the PR description.

Exactly why it is undesirable to introduce additional thread pools for plugin use when they are not strictly necessary: reusing an existing pool minimizes overhead.

I don't disagree with generally consolidating pools, just want to make we don't have thread-pool exhaustion risks. Revising that pool implementation would sort that out just fine.

@jglick
Copy link
Member Author

jglick commented Oct 26, 2018

that would have been good to know in the PR description

Sorry, thought that was clear from the PR description mentioning the upstream PR, and that PR depending on this one. Should have been more explicit.

want to make [sure] we don't have thread-pool exhaustion risks

Of course. The history: originally we did use Timer for this purpose; we did encounter thread-pool exhaustion. But that was because we were flooding the pool with lots of requests to check (each of which involves a few remote calls)—for every running sh step, there would be a task scheduled at least every 15s, and when the process was actively emitting output, this could be much more frequent. So they were all piling up on top of each other as soon as you ran any significant load. Besides introducing per-check timeouts, one fix was to introduce a separate thread pool, which in a loaded system could still become saturated, but at least unrelated parts of Jenkins needing Timer were unaffected.

When watch mode is active, this all changes. We do still run (bounded) tasks for every running sh step, but only once every 5m, as a way to make sure the build eventually aborts in case the watcher goes south. So even with lots of activity, we are unlikely to be hogging the system thread pool. Under this scenario, launching an extra couple dozen threads that are almost always idle is wasteful.

Now active only in USE_WATCHING mode.
@jglick jglick dismissed stale reviews from svanoort and dwnusbaum April 12, 2019 16:22

stale

@jglick jglick changed the title Reverting custom thread pool from #53 Reverting custom thread pool from #53 in watch mode Apr 12, 2019
@jglick jglick removed the on-hold label Apr 12, 2019
@jglick jglick requested a review from dwnusbaum April 12, 2019 16:23
@jglick jglick requested a review from car-roll November 22, 2021 20:55
@jglick jglick added the bug label Nov 22, 2021
@@ -210,7 +211,10 @@ public FormValidation doCheckLabel(@QueryParameter String label) {
public static long REMOTE_TIMEOUT = Integer.parseInt(System.getProperty(DurableTaskStep.class.getName() + ".REMOTE_TIMEOUT", "20"));

private static ScheduledThreadPoolExecutor threadPool;
private static synchronized ScheduledThreadPoolExecutor threadPool() {
private static synchronized ScheduledExecutorService threadPool() {
if (USE_WATCHING) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this feature flag remains off by default.

@car-roll car-roll merged commit f832bc1 into jenkinsci:master Nov 23, 2021
@jglick jglick deleted the standard-thread-pool branch November 23, 2021 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants