Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CacheLongKeyLIRS concurrency improvements #3069

Draft
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

amolokoedov
Copy link

@amolokoedov amolokoedov commented Mar 14, 2021

We've strike into significant concurrency limitation in a condition with large number of threads (60 threads) + empty database.

Looks like it happens because all threads read the same index which is small. And most of the time threads were blocked on CacheLongKeyLIRS.Segment.get() method (since is synchronized).

This degradation persists until this index is small and slowly improved with db growth (several hours in our case).

@amolokoedov amolokoedov marked this pull request as ready for review March 14, 2021 18:08
Copy link
Contributor

@katzyn katzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

First of all, please provide a some standalone test case that illustrates the described scalability problem. CacheLongKeyLIRS isn't perfect, but even if you see a problem with it in the profiler it doesn't mean that everything else is OK, maybe there are other problems and they make the situation with the cache so bad.

Comment on lines +310 to +328
/**
* Database setting <code>CACHE_CONCURRENCY</code>
* (default: 16).<br />
* Set the read cache concurrency.
*/
public final int cacheConcurrency = get("CACHE_CONCURRENCY", 16);

/**
* Database setting <code>AUTO_COMMIT_BUFFER_SIZE_KB</code>
* (default: depends on max heap).<br />
* Set the size of the write buffer, in KB disk space (for file-based
* stores). Unless auto-commit is disabled, changes are automatically
* saved if there are more than this amount of changes.
*
* When the value is set to 0 or lower, data is not automatically
* stored.
*/
public final int autoCommitBufferSize = get("AUTO_COMMIT_BUFFER_SIZE_KB",
Math.max(1, Math.min(19, Utils.scaleForAvailableMemory(64))) * 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, explain why you need these settings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use H2 as a cache for binary chunks in high concurrency environment (currently 60 threads). After we increase cacheConcurrency up to 1024 H2 looks really well most of the time (same or even better than postgresql). The only problem arises after we emptied db content - we observed a significant degradation in throughput and cpu usage.
Here's what we run into:

  1. CPU contention in Segment.get (optimization in this PR). Looks like when primary index is small and all 60 threads wants to use it they compete for the exactly same segments. The only way to fix it is to improve LIRS concurrency.
  2. After the fix Incorrect LEFT JOIN with aggrigated query in the join condition #1 write rate increased and next is 'back pressure' feature.
    // if unsaved memory creation rate is to high,
    // some back pressure need to be applied
    // to slow things down and avoid OOME
    Looks like 2 Mb is too small for 150 Mb per sec. This is why we need to introduce AUTO_COMMIT_BUFFER_SIZE_KB.

Comment on lines 617 to 620
/*
* Used as null value for ConcurrentSkipListSet
*/
private final Entry<V> ENTRY_NULL = new Entry<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to convert misses to AtomicLong and avoid a lock for a null value completely. In that case this dummy value and related tricks will not be required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Will do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and avoid a lock for a null value completely

Actually there is no separate lock for this null value - most of the time processing is done inside lock that is currently active. But i agree that it looks a bit 'artificial'.

Comment on lines 1265 to 1266
public int compareTo(Entry<V> tgt) {
return key == tgt.key ? 0 : key < tgt.key ? -1 : 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Override

Long.compare()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor

@katzyn katzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need a standalone test case for your problem for further investigation.

The whole implementation of this class with segments looks like something from Java 1.4–5 era when atomic operations made only their first steps in Java.

Most likely this class can be reimplemented without segments and exclusive locks with modern APIs.

Comment on lines 732 to 739
if (!l.tryLock()) {
if (value == null) {
misses.incrementAndGet();
} else {
concAccess.add(e);
}
return value;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When value == null there is no need to call tryLock().

Comment on lines 1142 to 1150
<T> T withLock(Supplier<T> c) {
l.lock();
try {
return c.get();
}
finally {
l.unlock();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view this method doesn't make code more readable and introduction of lambda functions can make it slower. Usually it isn't critical, but it looks strange when you're trying to improve performance.

@andreitokar
Copy link
Contributor

Forgive my ignorance, but I failed to understand how ReentrantLock.lock()/unlock() will improve concurrency compare to plain old synchronized?

Also what about unused field in Segment.concAccess of type ConcurrentSkipListSet<Entry> ?

Why eager creation of WeakReference is better than a lazy one? Even if every entry will get it at some point in it's life-cycle (just a speculation), total number of those WeakReference objects at any given time, will surely increase.

It seems like most significant optimization (eliminated synchronized) is on the code path of a missed get(), which will be followed by a much more expensive page read anyway.

Side notes:

  • unused import and commented out chunks of code does not make it more readable
  • please try to adhere to overall project's code style, i.e. do not use "final" for local variables, unless it's required by compiler.

@amolokoedov amolokoedov marked this pull request as draft March 19, 2021 06:05
@amolokoedov
Copy link
Author

Guys, i've this put request to draft mode until we are done with its tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants