Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for memory-safe cache #50

Open
pkruithof opened this issue Nov 23, 2018 · 11 comments
Open

Support for memory-safe cache #50

pkruithof opened this issue Nov 23, 2018 · 11 comments

Comments

@pkruithof
Copy link

Q A
Bug? no
New Feature? yes
Version all

The cache PRS's all write a string to the cache. This means that response bodies have to be placed in memory and serialized to do so. This can become a problem when you're caching large responses.

The Http Response classes already work with streams. It would be nice if this plugin just pipes the stream to a cache. When fetching you can then open the stream and use that as response body.

@pkruithof
Copy link
Author

I would be willing to work on something for this by the way, just wanted to see if there is any support for it first.

@dbu
Copy link
Contributor

dbu commented Nov 26, 2018

hi, i think this is a valid concern and would be nice to have indeed. does any of the PSR caches support streams? or would that mean to use a different cache implementation?

@pkruithof
Copy link
Author

The PSR (and common implementations) only explicitly mentions that the value needs to be serialized:

Implementing libraries MUST support all serializable PHP data types, including:

Strings - Character strings of arbitrary size in any PHP-compatible encoding.
Integers - All integers of any size supported by PHP, up to 64-bit signed.
Floats - All signed floating point values.
Boolean - True and False.
Null - The actual null value.
Arrays - Indexed, associative and multidimensional arrays of arbitrary depth.
Object - Any object that supports lossless serialization and deserialization such that $o == unserialize(serialize($o)). Objects MAY leverage PHP’s

So I think support for this would mean not using the cache interface. Unless we store a file path. But this would limit the plugin to only file-based caches.

@dbu
Copy link
Contributor

dbu commented Nov 26, 2018

at that point, would it be cleaner to make a separate stream cache plugin in a separate repository? should that build on top of this psr cache plugin? we still need to store metadata, and still need to restore response headers when using the cache, its only the handling of the actual body that is different.

is there any kind of generic stream caching or does the plugin have to define its own interface for that? maybe we could add that interface here and have the plugin optionally accept a stream cache and store the cache key in the psr cache, otherwise store the string in the psr cache. the file based stream cache would generate filenames and return those...

what i do not like about this idea is that there would be two parts of the cache, the psr and the stream, and if either is lost the other becomes pointless. it makes it harder to clean out old cache entries with processes used by the caching engine itself.

@pkruithof
Copy link
Author

Agreed. We created a workaround for this by implementing a special CacheItemPoolInterface implementation, that expects the values to come from the CachePlugin (meaning an array with a response, body, etc.). It saves 2 files per key: the body, written as is (not serialized), and the rest as a $key.meta file. If either of the two is missing, or out of date, the cache item is considered stale.

It works for us, but couples the PSR-implementation to this plugin. If either changes, it will stop working (which is why we wrote an integration test for it).

We tried to use 1 file, like so:

L1: the expires value
L2: the serialized `Response` object
L3: the response body

but we wound up using 2 files because we cannot be sure that the response body stream will not be rewound, thus giving a different response body.

I have yet to think of a better way to solve this, while using the PSR cache.

@dbu
Copy link
Contributor

dbu commented Nov 26, 2018

psr cache is not a must for me, if there are good reasons to do something else.

if your plugin could be made available somewhere, along with documentation, one option could also be that the documentation of the cache plugins mentions your plugin as an alternative for large responses. rather than trying to integrate it into this cache plugin.

@pkruithof
Copy link
Author

What would be a possibility: if we inject something like a BodySerializer in the plugin, with the following api:

interface BodySerializer
{
    public function serialize(StreamInterface $stream): string;
    public function deserialize(string $serialized): StreamInterface;
}

That way you'd have the regular implementation, which just does a passthrough. But a different implementation could persist the stream somewhere (to disk, but perhaps also to a database or whatever), and return the identifier for that persisted data, which in case of a filesystem would be the path.

There are 2 gotcha's here:

  1. When checking for a cache item existence/validity, it should check if the persisted data actually exists. Or otherwise fail when fetching the item (which would invalidate the item anyway). You could argue that the latter is preferable, since it's the responsibility of the cache itself to maintain the data.
  2. When removing an item from the cache, the serializer should receive a mention of this so it can clean up as well (which kind of makes it something different). This is not really a clean solution, so I'm open for suggestions here.

@dbu
Copy link
Contributor

dbu commented Nov 26, 2018

i like where this is heading. we can provide a StringSerializer or something and use that as default implementation when no BodySerializer is specified (to stay BC, and to keep the "default" use case simple) and provide a naive FilesystemSerializer that is configured with a cache directory and creates random filenames and returns those. ("naive" because one could start to namespace folders in case there is lots of files, and other kind of advanced stuff, but then i'd prefer somebody offering a separate repo with a BodySerializer adapter to some existing caching library that does all that)

i think the BodySerializer should be allowed to throw some sort of NotFoundException if $serialized is not found and if that happens, the rest of the cache entry would be removed too, to "sync". and for invalidation, we could have a PurgeableInterface::purge(string $serialized) or something that the BodySerializer can implement if it cares about invalidation. if the psr cache throws away entries, this still leaves leftovers in the BodySerializer - as with any filesystem cache, you'd have to setup some sort of cleanup cronjob for older cache entries, or use some smart storage with stream support that manages cache size for you.

@pkruithof
Copy link
Author

Where would this purge method be called exactly? I don't see any invalidation logic in the plugin (besides setting the expiration date in the cache items).

@dbu
Copy link
Contributor

dbu commented Nov 26, 2018

ups, i was mixing up things. you are right, there is no explicit purging. sorry for that confusion.

then the only thing we could do is pass the expires after information to the serialize method, in case the stream cache can do something with it. that would be the same value as

$cacheItem->set($data)->expiresAfter($this->calculateCacheItemExpiresAfter($maxAge));

@pkruithof
Copy link
Author

I see now that the StreamInterface has a isSeekable method. We could simply use 1 file as described here:
#50 (comment)

If the stream provided by the Serializer is non-seekable, the stream pointer could be set to the place where the actual response starts, and just use that. That way you don't have any cleanup issues either.

I'll see if I can whip something up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants