Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #3569: removing all api references to customResource #4148

Merged
merged 8 commits into from Jun 2, 2022

Conversation

shawkins
Copy link
Contributor

@shawkins shawkins commented May 15, 2022

Description

Addresses #3569 by removing all api references to CustomResource

It also further minimizing SharedInformerFactoryImpl concerns and removes the use of the executor - before I add the migration guide for this, we should have consensus on the approach:

Options include:

  1. Leave it up to the user to user to handle their own threading concerns wrt Watchers and ResourceEventHandlers - the main pain point with this change is for users of the SharedInformerFactory who were relying on the async nature of the associated executor, but that is not consistent with the other apis (creating a watch or Informable informer). There has also been at least one issue where it seems that there is confusion over the role of the factory executor.
  2. Provide wrappers or convenience methods to create async versions of the Watchers and ResourceEventHandlers - e.g. Watcher.async(Watcher watcher, Executor executor). This means we don't have to introduce additional dsl methods and it's very flexible as to what executor is used. We do have the common executor via Utils.getCommonExecutor - but we really shouldn't be promoting static usage. Another goal of the static refactoring to scope the common executor per client, if that change were worked, you could have have client methods available as well - client.configMaps().watch(client.async(watcher)).
  3. Make it annotation driven - if the watcher or handler is annotated with Async (spring has a similar annotation) then use the common pool (which as mentioned above will eventually be configurable per client), otherwise call it directly. This tries to encapsulate things more than 2.

Any other thoughts?

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@shawkins shawkins force-pushed the customresource branch 2 times, most recently from 40cc132 to a7ab523 Compare May 15, 2022 18:55
doc/FAQ.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@metacosm metacosm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from minor wording issues.

@metacosm
Copy link
Collaborator

Regarding dealing with async, I haven't thought too deep about it but right now, I'd favor approach 2 which seems to provide the most flexibility.

@shawkins
Copy link
Contributor Author

I apologize that there are really two things being merged together here - trying to clarify the Executor usage turned out to be more involved than I would have hoped.

@metacosm @manusa the next set of changes remove the usage of the static Utils.getCommonExecutor and allow for an executor or executorservice to be scoped to the client via the KubernetesClientBuilder. This can be a different executor than the one used for IO operations because this is for potentially blocking calls (although the default logic for creating the IO thread pools for OkHttp or the JDK client default to a cached / unbounded thread pool, which could be used for both).

This client scoped executor will be used rather than allowing for an ExecutorService per SharedInformerFactory - which we didn't document well nor provide any examples of. It is also no longer needed to start the informers, since we have a non-blocking way of doing that - it's real purpose is to decouple the ResourceEventHandler method calls from the IO threads. For consistency we'll use the same executor under informers created by the Informable dsl. When using the OkHttp client other than a few corner cases it was ok to use the IO thread to process events (that's why the DSL was using Runnable::run) because the IO threads are managed per websocket. However with the JDK client that can be a problem - there is just a small dedicated IO thread pool, which you can quickly exhaust and stall / deadlock if you block. The informers used under the covers for Waitable or informOnCondition I left as just using the IO thread as we already had it documented and/or it's unlikely that the user would use blocking logic there - we can of course change that to use the task executor if needed.

In general I wanted to consolidate / hide / document the threading concerns as much as possible - anything that we do api-wise like Watcher.async(Watcher watcher, Executor executor) is just a workaround. The real solution is for the call back methods to return a CompletionStage or CompletableFuture - CompletionStage eventReceived(Action action, T resource) - or pass in a parameter that allows for triggering when to receive more events.

I'll refine the above into the migration doc if this all seems reasonable.

@shawkins shawkins force-pushed the customresource branch 3 times, most recently from 202253a to dabc2b9 Compare May 17, 2022 19:54
@manusa
Copy link
Member

manusa commented May 18, 2022

I think it might be better if we have a sync review on this one, especially to think about the executor removal options.

doc/FAQ.md Outdated Show resolved Hide resolved
doc/FAQ.md Show resolved Hide resolved

@Override
public Executor get() {
return Executors.newCachedThreadPool(Utils.daemonThreadFactory(this));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this one.
When is the DEFAULT_EXECUTOR_SUPPLIER.get() method going to be invoked during the lifecycle of the application?
Should we just keep a single instance? Is there a use-case where we want to maintain more than one Executor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called once per client. With the way things are currently every call to build() creates a fresh client. If the builder is only allowed to create a single client, then it could just be a single instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we switch this to create lazily loaded singleton instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KubernetesClientBuilder is new, so whatever we do there won't affect existing users. However that is functionally different than many of the other Builders, including the HttpClient.Builder. Because you can still call the with methods after build it would probably be clearest if subsequent calls to build through an illegalstateexception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would probably be clearest if subsequent calls to build through an illegalstateexception.

Yup, I think that's even better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take that up as a follow-on task.

Co-authored-by: Marc Nuri <marc@marcnuri.com>
@shawkins
Copy link
Contributor Author

@manusa assuming #4159 is going in first, I'll wait to rebase this until after that.

@manusa
Copy link
Member

manusa commented May 19, 2022

@manusa assuming #4159 is going in first, I'll wait to rebase this until after that.

OK, I'll merge #4159 because it's ready to merge and CI takes forever (If that wasn't the case, this one should go in first)

customresource

# Conflicts:
#	kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/OperationSupport.java
#	kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/PortForwarderWebsocket.java
#	kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/apps/v1/RollingUpdater.java
customresource

# Conflicts:
#	kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/AbstractWatchManager.java
#	kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatcherWebSocketListener.java
#	kubernetes-client/src/test/java/io/fabric8/kubernetes/client/dsl/internal/AbstractWatchManagerTest.java
@manusa
Copy link
Member

manusa commented Jun 1, 2022

The ExecWebSocketListenerTest never ends in the Java8 tests. Please take a look.

@shawkins
Copy link
Contributor Author

shawkins commented Jun 1, 2022

@manusa this seems to be happening on the BaseOperationTest. The only thing that really changed there is the usage of the ForkJoinPool.commonPool rather than the Utils common pool. In several places I have the tests using the ForkJoinPool.commonPool rather than inline execution to preserve the multi-threaded nature. Using the common pool helps the overall test execution by not creating / destroying so many thread pools. It could be possible that something is holding the common threadpool thread and the default on github for parallelism is small enough that we have a problem. We wouldn't have seen this before unless as the Utils common thread pool was unlimited in size.

@manusa
Copy link
Member

manusa commented Jun 1, 2022

I think this pool might be backed by the number of available cores, and I guess that in a CI env this must be probably just one. We should probably replace this by a pool with a fixed number of threads.

@shawkins
Copy link
Contributor Author

shawkins commented Jun 1, 2022

I think this pool might be backed by the number of available cores, and I guess that in a CI env this must be probably just one.

It will default to Max(1, cores - 1).

We should probably replace this by a pool with a fixed number of threads.

There are several things. The first is to figure out what task is hanging around - it's not expected that something should indefinitely block.

The next step is that we can:

  • explicitly set java.util.concurrent.ForkJoinPool.common.parallelism (possibly to 1 in the maven properties) - I can reproduce the hang if I run locally with java.util.concurrent.ForkJoinPool.common.parallelism=1
  • or stop using the ForkJoinPool common pool because it's not really intended for arbitrary tasks. Fully managing a threadpool per test is possible and would help localize lingering tasks.

@manusa
Copy link
Member

manusa commented Jun 1, 2022

OK, I get your point now.

My assumption is that the commonPool might be used elsewhere (OkHttp) and we might be running into some lock. ---> If we replace the ForkJoinPool with a singlethreaded pool, does it still hang?

@shawkins
Copy link
Contributor Author

shawkins commented Jun 1, 2022

My assumption is that the commonPool might be used elsewhere (OkHttp) and we might be running into some lock.

Turns out this is only occasionally reproducible on java 8 when the parallelism is 1. No task is hanging. The common pool simply won't execute the task.

If we replace the ForkJoinPool with a singlethreaded pool, does it still hang?

It appears fine. I've added another commit that removes the usage of the fork join common pool.

@sonarcloud
Copy link

sonarcloud bot commented Jun 1, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

53.5% 53.5% Coverage
0.0% 0.0% Duplication

@manusa manusa merged commit c4a3291 into fabric8io:master Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants