New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CacheLongKeyLIRS concurrency improvements #3069
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
First of all, please provide a some standalone test case that illustrates the described scalability problem. CacheLongKeyLIRS
isn't perfect, but even if you see a problem with it in the profiler it doesn't mean that everything else is OK, maybe there are other problems and they make the situation with the cache so bad.
/** | ||
* Database setting <code>CACHE_CONCURRENCY</code> | ||
* (default: 16).<br /> | ||
* Set the read cache concurrency. | ||
*/ | ||
public final int cacheConcurrency = get("CACHE_CONCURRENCY", 16); | ||
|
||
/** | ||
* Database setting <code>AUTO_COMMIT_BUFFER_SIZE_KB</code> | ||
* (default: depends on max heap).<br /> | ||
* Set the size of the write buffer, in KB disk space (for file-based | ||
* stores). Unless auto-commit is disabled, changes are automatically | ||
* saved if there are more than this amount of changes. | ||
* | ||
* When the value is set to 0 or lower, data is not automatically | ||
* stored. | ||
*/ | ||
public final int autoCommitBufferSize = get("AUTO_COMMIT_BUFFER_SIZE_KB", | ||
Math.max(1, Math.min(19, Utils.scaleForAvailableMemory(64))) * 1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, explain why you need these settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use H2 as a cache for binary chunks in high concurrency environment (currently 60 threads). After we increase cacheConcurrency up to 1024 H2 looks really well most of the time (same or even better than postgresql). The only problem arises after we emptied db content - we observed a significant degradation in throughput and cpu usage.
Here's what we run into:
- CPU contention in Segment.get (optimization in this PR). Looks like when primary index is small and all 60 threads wants to use it they compete for the exactly same segments. The only way to fix it is to improve LIRS concurrency.
- After the fix Incorrect LEFT JOIN with aggrigated query in the join condition #1 write rate increased and next is 'back pressure' feature.
// if unsaved memory creation rate is to high,
// some back pressure need to be applied
// to slow things down and avoid OOME
Looks like 2 Mb is too small for 150 Mb per sec. This is why we need to introduce AUTO_COMMIT_BUFFER_SIZE_KB.
/* | ||
* Used as null value for ConcurrentSkipListSet | ||
*/ | ||
private final Entry<V> ENTRY_NULL = new Entry<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to convert misses
to AtomicLong
and avoid a lock for a null
value completely. In that case this dummy value and related tricks will not be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and avoid a lock for a null value completely
Actually there is no separate lock for this null value - most of the time processing is done inside lock that is currently active. But i agree that it looks a bit 'artificial'.
public int compareTo(Entry<V> tgt) { | ||
return key == tgt.key ? 0 : key < tgt.key ? -1 : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Override
Long.compare()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need a standalone test case for your problem for further investigation.
The whole implementation of this class with segments looks like something from Java 1.4–5 era when atomic operations made only their first steps in Java.
Most likely this class can be reimplemented without segments and exclusive locks with modern APIs.
if (!l.tryLock()) { | ||
if (value == null) { | ||
misses.incrementAndGet(); | ||
} else { | ||
concAccess.add(e); | ||
} | ||
return value; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When value == null
there is no need to call tryLock()
.
<T> T withLock(Supplier<T> c) { | ||
l.lock(); | ||
try { | ||
return c.get(); | ||
} | ||
finally { | ||
l.unlock(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view this method doesn't make code more readable and introduction of lambda functions can make it slower. Usually it isn't critical, but it looks strange when you're trying to improve performance.
Forgive my ignorance, but I failed to understand how ReentrantLock.lock()/unlock() will improve concurrency compare to plain old synchronized? Also what about unused field in Segment.concAccess of type ConcurrentSkipListSet<Entry> ? Why eager creation of WeakReference is better than a lazy one? Even if every entry will get it at some point in it's life-cycle (just a speculation), total number of those WeakReference objects at any given time, will surely increase. It seems like most significant optimization (eliminated synchronized) is on the code path of a missed get(), which will be followed by a much more expensive page read anyway. Side notes:
|
Guys, i've this put request to draft mode until we are done with its tests. |
We've strike into significant concurrency limitation in a condition with large number of threads (60 threads) + empty database.
Looks like it happens because all threads read the same index which is small. And most of the time threads were blocked on CacheLongKeyLIRS.Segment.get() method (since is synchronized).
This degradation persists until this index is small and slowly improved with db growth (several hours in our case).