New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some new findings on the "dead lock" in the previous issues #768
Comments
Unfortunately this is a known issue that cannot be mitigated directly. If the eviction is blocked on a computation to remove the entry, which is due to the hashbin being locked by another computation, then the eviction must wait. If that is a very long running operation then writes will accumulate, create backpressure, and further cache writes will be blocked to avoid runaway growth. A log warning was added in v3.0.6 but has not been released in a backport yet. Both Caffeine and ConcurrentHashMap hint to this problem by saying, * Some attempted update operations on this cache by other threads may be blocked while the
* computation is in progress, so the computation should be short and simple, and must not attempt
* to update any other mappings of this cache. The cache cannot detect this case as there is no When the cache detects this problem in v3 it logs the following,
|
noted with many thanks. We'll consider that |
I have something different to share, and I think this is not a deadlock and it is a problem related to the eviction mechanism. We also observed the problem of almost all threads are waiting with the thread dump:
We are using caffeine with version: 2.9.3
with jdk version 17 as you can see in the thread stack. Our cache configuration is:
We enabled the JFR and observed something very interesting:
afterWrite()
which is almost all my threads are waiting. (2) The eviction task executed default by common ForkJoinPool, which got the thread stack as below:as shown above, the
eviction lock
is hold by this thread but it is blocked on ConcurrentHashMap$Node. Let's take a closer look through JFR:From the figure we can see that, this thread is blocking for 1.371s. And this lock is previously hold by XNIO2-task-7, which is loading value to this cache from JFR:
Therefore, the problem is caused by:
cache.get(key, k -> {....})
, cache has expire conf.afterWrite
, some of them(we name them as group 2) are computing the key.The text was updated successfully, but these errors were encountered: