Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EZP-29659: As a v2 Cluster Admin & Developer I want support for InMemory SPI cache #2553

Merged
merged 17 commits into from Mar 18, 2019

Conversation

andrerom
Copy link
Contributor

@andrerom andrerom commented Mar 4, 2019

Question Answer
JIRA issue EZP-29659
New feature yes
Target version 2.5 master
BC breaks no
Tests pass yes
Doc needed yes, for global YML params

Introduces time limited (3sec) and object limited (100 items) In-Memory caching to meta data elements; for now Language, ContentType (inc groups) & UserHandler (user meta data, roles, role assignments).

This can be built upon further later (see inline example), but this should already help solve the performance issues we have with Cluster setups.

For comparison here is admin/dashboard on 2.4 ezplatform-ee-demo:

Cache lookups Response time locally Estimated Response time Cluster*
Base ~6k 1.2 sec  12-32 sec
+this PR ~1.2k 7-800ms 2.4-7 sec
+optimized adapter ~600 ~600ms 1.2-3 sec

* Numbers estimates a latency of 2-5ms per call to Redis (in cluster setup Redis is often on another server as per nature of a cluster). If RedisCluster is used then it will need Symfony 3.4.23, otherwise base setup will have 17k lookups to Redis nodes and about 34-85 sec response time.

Done:

  • Added methods to be able to add more stats on SPI cache calls and improved eZ tab
  • Added a simple in-memory cache pool + tests
  • Exposed a new abstract for that takes advantage of both things
  • Refactored Legacy SE cache to reuse cache pool to avoid cache issues and make sure cache efficiency is good, even for logic inside legacy storage engine
  • Refactored ContentLanguage-, ContentType-, and User-hander for the new Abstract to simplify code

Improvements to SPI Persistence/Cache logging, here on admin/dashboard:
Symfony Profiler eZ toolbar on admin/dashboard

Symfony Profiler eZ tab for admin/dashboard

TODO:

  • Impl
  • Implement tests for in-memory pool
  • Fix new code according to Coding Standards ($ composer fix-cs).
  • Ask for Code Review.

Review notes:

  • Tip: Look at changes in tests in order to see when validation logic has been attempted to be improved (a few methods where from what I could see clearing too much)
  • Open questions:
    • Should the design be different? aka use for instance PSR-6, or use PSR-6 / Symfony Cache interface for handlers where we want in-memory cache only, to not change handlers at all? (then loose out of the simplifications to them + the improved logging), or ... ?
    • Should we find a different way (more abstracted) way for integration tests and others, to clear cache? Also be able to disable cache as well for scripts that want that (for the running process).

Followup:

  • Own cache clearing service for clearing cache across inMemory, symfony pool, and http
    Also requested by Community to expose a command to clear this also (with option to specify which once)
  • https://jira.ez.no/browse/EZP-30298
  • LRU, unless limit is increased, adding cache on more handlers will increase chances of cache being aggressively wiped. So stating to count reads and sort on that in vacuum() to make sure to prefer to 1. evict non used or just once used cache objects before 2. starting to evict the oldest items if still needed.

@ezrobot

This comment has been minimized.

@andrerom andrerom force-pushed the in-memory-cache2 branch 2 times, most recently from 6910254 to 0f22413 Compare March 5, 2019 21:13
@ezrobot

This comment has been minimized.

@andrerom andrerom changed the title [Feature] In memory cache for SPI calls EZP-29659: As a v2 Cluster Admin & Developer I want support for InMemory SPI cache Mar 6, 2019
@andrerom andrerom marked this pull request as ready for review March 6, 2019 13:22
@andrerom
Copy link
Contributor Author

andrerom commented Mar 6, 2019

On request by @lserwatka added all of you to review ;)

@gggeek
Copy link
Contributor

gggeek commented Mar 6, 2019

The mandatory OT comment from orbit:

it seems to me that this is based on what our german developer friends used to call an "identity-map" cache, i.e. we are not serializing objects for storing them to memory, but instead just putting a ref to them into an array, which is as fast as anyone can get, which I strongly agree with - at least for value-objects, which hold no refs to other objects/resources.

Without having looked at the code, my main question is: what about expiration? Apart from the TTL thing, which seems more or less to be a measure introduced for cli scripts which are supposed to live longer than 1-2 seconds, is the in-memory cache subject to transparent expiration when the script itself does an update of the relevant entity? Ie. if a web page updates, say, a ContentType, will the php code running within the context of that same page transparently see the new version coming from the in-memory cache after the (single?) expiration call done by the kernel?

Copy link
Contributor

@gggeek gggeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another silly question: I imagine that you have explored usage of ymfony\Component\Cache\Adapter\ArrayAdapter and Cache\Adapter\PHPArray for the base in-memory caching logic?

@andrerom
Copy link
Contributor Author

andrerom commented Mar 6, 2019

Symfony\Component\Cache\Adapter\ArrayAdapter and Cache\Adapter\PHPArray for the base in-memory caching logic?

Does not have limit last time I checked. Also serializes the object (but configurable like global ttl) to avoid issues if being written to which is not relevant here. It also is global unless we go for different adapters (proxy, or w/o) per handler, and if allowed to be global for cache pool means it will be used for cases where it really shouldn't, like having same ttl and limits enforced for meta data as well as for Content for instance. See closed PR that was first attempt at POC for this: #2453

Without having looked at the code, my main question is:

.... ;)

is the in-memory cache subject to transparent expiration when the script itself does an update of the relevant entity? Ie. if a web page updates, say, a ContentType, will the php code running within the context of that same page transparently see the new version coming from the in-memory cache after the (single?) expiration call done by the kernel?

Same process yes. So within single script execution or single request. If your process update it, you are guaranteed to get fresh objects of it straight after (assuming you re-load it after change, as was the case before).

For other processes (in progress) the update will propagate to Symfony cache and Database immediately like before, and the script will get it after the ttl expires, hence why the TTL should never be set very high. It's only a Request /burst cache, where it's down to probability of what can be somewhat safely cached for short bursts.

All script / request processes started after the change will get fresh version of it immediately.

Copy link
Member

@adamwojs adamwojs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results looks impressive!

@andrerom
Copy link
Contributor Author

@alongosz @adamwojs @ViniTou Fixed and rebased, expected failure is on remaining Twig 2.7.1 break.

@ezrobot

This comment has been minimized.

@ezrobot

This comment has been minimized.

@ezrobot

This comment has been minimized.

public function setMulti(array $objects, callable $objectIndexes, string $listIndex = null): void
{
// If objects accounts for more then 20% of our limit, assume it's bulk content load and skip saving in-memory
if ($this->enabled === false || \count($objects) >= $this->limit / 5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is worth it but $this->limit / 5 could be calculated only once and stored in property e.g. bulkLoadLimit. As a bonus it will improve code readibility ;-)


$time = microtime(true);
// if set add objects to cache on list index (typically a "all" key)
if ($listIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "0" as $listIndex value, is it possible?

Copy link
Contributor Author

@andrerom andrerom Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not make much sense, as key/indexes needs to be same as used in symfony cache given this is only meant to be used inside abstract in-memory handler.

private function vacuum(): void
{
// Vacuuming cache in bulk, clearing the 33% oldest cache values
$this->cache = \array_slice($this->cache, (int) ($this->limit / 3));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$this->limit / 3 also could be calculated only once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the / 5 comment:

  1. I haven't seen this being called very often.
  2. I don't expect the match here to take much ops compared to array_slice.

// simplest/fastests hash possible to identify if we have already collected this before to save on memory use
$callHash = \hash('adler32', $method . \serialize($arguments));
if (empty($this->calls[$callHash])) {
$this->calls[$callHash] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an improvement would be great have VO instead of arrays with fixed keys (also in terms of memory usage)

final class PersistenceLoggerCall 
{
        /** @var string */
	private $method;
       /** @var array */
	private $arguments;
        /** @var \eZ\Publish\Core\Persistence\Cache\PersistenceLoggerStats */
	private $stats;
        /** @var \eZ\Publish\Core\Persistence\Cache\PersistenceLoggerCallTrace[] */
	private $traces;

	// ...
} 

final class PersistenceLoggerStats 
{
    /** @var int */
    private $uncached;
    /** @var int */
    private $miss;
    /** @var int */
    private $hit;
    /** @var int */
    private $memory;

    // ... 	
}  

final class PersistenceLoggerCallTrace 
{
    /** @var string[] */
    private $trace;
    /** @var int */   
    private $count;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a followup if you want to work on it before final ;)

Copy link
Member

@alongosz alongosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming late to the party, +1. Thanks for addressing my callable type-hints remarks :)
Found one typo:

eZ/Publish/Core/settings/storage_engines/cache.yml Outdated Show resolved Hide resolved
Co-Authored-By: andrerom <andre.romcke@gmail.com>
@andrerom
Copy link
Contributor Author

andrerom commented Mar 18, 2019

QA passes will as agreed with Lukasz be done on rc2, thanks for all the reviews and let’s make 2.5 the fastest release yet 😉

@andrerom andrerom merged commit f00653d into master Mar 18, 2019
@andrerom andrerom deleted the in-memory-cache2 branch March 18, 2019 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
7 participants