Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#7837 paginate with custom identifier types even with enabled DQL query cache #7865

Conversation

Ocramius
Copy link
Member

@Ocramius Ocramius commented Oct 10, 2019

This patch enforces re-parsing of DQL queries used in the paginator internals, so that the WhereInWalker can correctly process query parameters after evaluating the AST structure (root expression, specifically).

While this isn't nice for performance, it is also true that the alternative approach would be to run the parsing manually on the DQL query (to extract the root of the selection) and then operating on that (this was already attempted in #7821 in 39d2113).

Fixes #7837

Since `WhereInWalker` does not run, query parameters are not translated
from their in-memory type to the expected SQL type when the paginator
is run again with the same DQL string. This is an architectural
issue, since (for the sake of simplicity) we moved parameter
translation into the SQL walker, we didn't consider that SQL
walkers only act when no cache is in place. The translatio
needs to be moved into the paginator logic again.
…ng used

In order to figure out the paginated query identifier type, we would
have to parse the DQL query into an AST+SQL anyway, so we'd have
to re-parse it manually: instead of doing that, we can force the
`WhereInWalker` to be reached at all times by forcing the
`$whereInQuery` to use no query cache.

While it is a sad performance regression, it is also not a
noticeable one, since we'll be performing an `O(1)` operation
around an I/O one (query execution, in this case).
@Ocramius Ocramius added the Bug label Oct 10, 2019
@Ocramius Ocramius added this to the 2.6.5 milestone Oct 10, 2019
@Ocramius Ocramius self-assigned this Oct 10, 2019
array_map(static function (GH7820Line $line) : string {
return $line->toString();
}, iterator_to_array(new Paginator($query))),
'Paginator runs once, query cache is populated with DQL -> SQL translation'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me this should have been a comment. It doesn't tell me meaning of assertion failure.

instances: 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had these as comments initially: moved to assertion message on purpose, since it makes failures much more descriptive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this says what happens, not what should have happened

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Messages like "null is given, 1 was expected" are the usual messages that are shown on failing assertions.

array_map(static function (GH7820Line $line) : string {
return $line->toString();
}, iterator_to_array(new Paginator($query))),
'Paginator runs once, query cache is populated with DQL -> SQL translation'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Messages like "null is given, 1 was expected" are the usual messages that are shown on failing assertions.

array_map(static function (GH7820Line $line) : string {
return $line->toString();
}, iterator_to_array(new Paginator($query))),
'Paginator runs again, SQL parameters are translated again, even with cached DQL -> SQL translation'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as mentioned for the assert-message above.

@ostrolucky ostrolucky force-pushed the fix/#7837-paginate-with-custom-identifier-types-even-with-cached-dql-parsing branch from baa1230 to 98e557b Compare November 14, 2019 22:37
@ostrolucky
Copy link
Member

I have taken liberty to improve these messages. Let me know if it's ok now

@lcobucci lcobucci dismissed SenseException’s stale review November 15, 2019 10:08

Assertion messages have been changed by @ostrolucky and are much better. Thanks everyone 👍

@lcobucci lcobucci merged commit 855244f into doctrine:2.6 Nov 15, 2019
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jan 22, 2023
This PR prevents the Paginator from causing OpCache "wasted memory" to increase _on every request_ when used with Symfony's `PhpFilesAdapter` as the cache implementation for the query cache.

Depending on configured thresholds, wasted memory this will either cause periodic opcache restarts or running out of memory and not being able to cache additional scripts ([Details](https://tideways.com/profiler/blog/fine-tune-your-opcache-configuration-to-avoid-caching-suprises)).

Fixes doctrine#9917, closes doctrine#10095.

There is a long story (doctrine#7820, doctrine#7821, doctrine#7837, doctrine#7865) behind how the Paginator can take care of DBAL type conversions when creating the pagination query. This conversion has to transform identifier values before they will be used as a query parameter, so it has to happen every time the Paginator is used.

For reasons, this conversion happens inside `WhereInWalker`. Tree walkers like this are used only during the DQL parsing/AST processing steps. Having a DQL query in the query cache short-cuts this step by fetching the parsing/processing result from the cache.

So, to make sure the conversion happens also with the query cache being enabled, this line

https://github.com/doctrine/orm/blob/1753d035005c1125c9fb4855c3fa629341e5734d/lib/Doctrine/ORM/Tools/Pagination/Paginator.php#L165

was added in doctrine#7837. It causes `\Doctrine\ORM\Query::parse()` to re-parse the query every time, but will also put the result into the query cache afterwards.

At this point, the setup described in doctrine#9917 – which, to my knowledge, is the default in Symfony + DoctrineBundle projects – will ultimately bring us to this code:

https://github.com/symfony/symfony/blob/4b3391725f2fc4a072e776974f00a992cbc70515/src/Symfony/Component/Cache/Adapter/PhpFilesAdapter.php#L248-L249

When writing a cache item with an already existing key, the driver has to make sure the opcache will honor the changed PHP file. This is what causes _wasted memory_ to increase.

Instead of using `\Doctrine\ORM\Query::expireQueryCache()`, which will force `\Doctrine\ORM\Query::parse()` to parse the query again before putting it into the cache, use `\Doctrine\ORM\Query::useQueryCache(false)`. The subtle difference is the latter will not place the processed query in the cache in the first place.

A test case is added to check that repeated use of the paginator does not call the cache to update existing keys. That should suffice to make sure we're not running into the issue, while at the same time not complicating tests by using the `PhpFilesAdapter` directly.

Note that in order to observe the described issue in tests, you will need to use the `PhpFilesDriver` and also make sure that OpCache is enabled and also activated for `php-cli` (which is running the unit tests).

This particular subquery generated/used by the Paginator is not put into the query cache. The DQL parsing/to-SQL conversion has to happen _every time_ the Paginator is used.

This, however, was already the case before this PR. In other words, this PR only changes that we do not store/update the cached result every time, but instead completely omit caching the query.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jan 23, 2023
Make the `Paginator`-internal query (`... WHERE ... IN (id, id2, id3...)`) cacheable in the query cache again.

When the Paginator creates the internal subquery that does the actual result limiting, it has to take DBAL type conversions for the identifier column of the paginated root entity into account (doctrine#7820, fixed in doctrine#7821).

In order to perform this type conversion, we need to know the DBAL type class for the root entity's identifier, and we have to derive it from a given (arbitrary) DQL query. This requires DQL parsing and inspecting the AST, so doctrine#7821 placed the conversion code in the `WhereInWalker` where all the necessary information is available.

The problem is that type conversion has to happen every time the paginator is run, but the query that results from running `WhereInWalker` would be kept in the query cache. This was reported in doctrine#7837 and fixed by doctrine#7865, by making this particular query expire every time. The query must not be cached, since the necessary ID type conversion happens as a side-effect of running the `WhereInWalker`.

The Paginator internal query that uses `WhereInWalker` has its DQL re-parsed and transformed in every request. In addition to the processing overhead, this may also waste opcache memory (doctrine#9917).

This PR moves the code that determines the DBAL type out of `WhereInWalker` into a dedicated SQL walker class, `RootTypeWalker`.

`RootTypeWalker` uses a ~hack~  clever trick to report the type back: It sets the type as the resulting "SQL" string. The benefit is that `RootTypeWalker` results can be cached in the query cache themselves. Only the first time a given DQL query has to be paginated, we need to run this walker to find out the root entity's ID type. After that, the type will be returned from the query cache.

With the type information being provided, `Paginator` can take care of the necessary conversions by itself. This happens every time the Paginator is used.

The internal query that uses `WhereInWalker` can be cached again since it no longer has side effects.
derrabus pushed a commit that referenced this pull request Jan 23, 2023
This PR prevents the Paginator from causing OpCache "wasted memory" to increase _on every request_ when used with Symfony's `PhpFilesAdapter` as the cache implementation for the query cache.

Depending on configured thresholds, wasted memory this will either cause periodic opcache restarts or running out of memory and not being able to cache additional scripts ([Details](https://tideways.com/profiler/blog/fine-tune-your-opcache-configuration-to-avoid-caching-suprises)).

Fixes #9917, closes #10095.

There is a long story (#7820, #7821, #7837, #7865) behind how the Paginator can take care of DBAL type conversions when creating the pagination query. This conversion has to transform identifier values before they will be used as a query parameter, so it has to happen every time the Paginator is used.

For reasons, this conversion happens inside `WhereInWalker`. Tree walkers like this are used only during the DQL parsing/AST processing steps. Having a DQL query in the query cache short-cuts this step by fetching the parsing/processing result from the cache.

So, to make sure the conversion happens also with the query cache being enabled, this line

https://github.com/doctrine/orm/blob/1753d035005c1125c9fb4855c3fa629341e5734d/lib/Doctrine/ORM/Tools/Pagination/Paginator.php#L165

was added in #7837. It causes `\Doctrine\ORM\Query::parse()` to re-parse the query every time, but will also put the result into the query cache afterwards.

At this point, the setup described in #9917 – which, to my knowledge, is the default in Symfony + DoctrineBundle projects – will ultimately bring us to this code:

https://github.com/symfony/symfony/blob/4b3391725f2fc4a072e776974f00a992cbc70515/src/Symfony/Component/Cache/Adapter/PhpFilesAdapter.php#L248-L249

When writing a cache item with an already existing key, the driver has to make sure the opcache will honor the changed PHP file. This is what causes _wasted memory_ to increase.

Instead of using `\Doctrine\ORM\Query::expireQueryCache()`, which will force `\Doctrine\ORM\Query::parse()` to parse the query again before putting it into the cache, use `\Doctrine\ORM\Query::useQueryCache(false)`. The subtle difference is the latter will not place the processed query in the cache in the first place.

A test case is added to check that repeated use of the paginator does not call the cache to update existing keys. That should suffice to make sure we're not running into the issue, while at the same time not complicating tests by using the `PhpFilesAdapter` directly.

Note that in order to observe the described issue in tests, you will need to use the `PhpFilesDriver` and also make sure that OpCache is enabled and also activated for `php-cli` (which is running the unit tests).

This particular subquery generated/used by the Paginator is not put into the query cache. The DQL parsing/to-SQL conversion has to happen _every time_ the Paginator is used.

This, however, was already the case before this PR. In other words, this PR only changes that we do not store/update the cached result every time, but instead completely omit caching the query.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jan 23, 2023
Make the `Paginator`-internal query (`... WHERE ... IN (id, id2, id3...)`) cacheable in the query cache again.

When the Paginator creates the internal subquery that does the actual result limiting, it has to take DBAL type conversions for the identifier column of the paginated root entity into account (doctrine#7820, fixed in doctrine#7821).

In order to perform this type conversion, we need to know the DBAL type class for the root entity's identifier, and we have to derive it from a given (arbitrary) DQL query. This requires DQL parsing and inspecting the AST, so doctrine#7821 placed the conversion code in the `WhereInWalker` where all the necessary information is available.

The problem is that type conversion has to happen every time the paginator is run, but the query that results from running `WhereInWalker` would be kept in the query cache. This was reported in doctrine#7837 and fixed by doctrine#7865, by making this particular query expire every time. The query must not be cached, since the necessary ID type conversion happens as a side-effect of running the `WhereInWalker`.

The Paginator internal query that uses `WhereInWalker` has its DQL re-parsed and transformed in every request. In addition to the processing overhead, this may also waste opcache memory (doctrine#9917).

This PR moves the code that determines the DBAL type out of `WhereInWalker` into a dedicated SQL walker class, `RootTypeWalker`.

`RootTypeWalker` uses a ~hack~  clever trick to report the type back: It sets the type as the resulting "SQL" string. The benefit is that `RootTypeWalker` results can be cached in the query cache themselves. Only the first time a given DQL query has to be paginated, we need to run this walker to find out the root entity's ID type. After that, the type will be returned from the query cache.

With the type information being provided, `Paginator` can take care of the necessary conversions by itself. This happens every time the Paginator is used.

The internal query that uses `WhereInWalker` can be cached again since it no longer has side effects.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jan 23, 2023
Make the `Paginator`-internal query (`... WHERE ... IN (id, id2, id3...)`) cacheable in the query cache again.

When the Paginator creates the internal subquery that does the actual result limiting, it has to take DBAL type conversions for the identifier column of the paginated root entity into account (doctrine#7820, fixed in doctrine#7821).

In order to perform this type conversion, we need to know the DBAL type class for the root entity's identifier, and we have to derive it from a given (arbitrary) DQL query. This requires DQL parsing and inspecting the AST, so doctrine#7821 placed the conversion code in the `WhereInWalker` where all the necessary information is available.

The problem is that type conversion has to happen every time the paginator is run, but the query that results from running `WhereInWalker` would be kept in the query cache. This was reported in doctrine#7837 and fixed by doctrine#7865, by making this particular query expire every time. The query must not be cached, since the necessary ID type conversion happens as a side-effect of running the `WhereInWalker`.

The Paginator internal query that uses `WhereInWalker` has its DQL re-parsed and transformed in every request. In addition to the processing overhead, this may also waste opcache memory (doctrine#9917).

This PR moves the code that determines the DBAL type out of `WhereInWalker` into a dedicated SQL walker class, `RootTypeWalker`.

`RootTypeWalker` uses a ~hack~  clever trick to report the type back: It sets the type as the resulting "SQL" string. The benefit is that `RootTypeWalker` results can be cached in the query cache themselves. Only the first time a given DQL query has to be paginated, we need to run this walker to find out the root entity's ID type. After that, the type will be returned from the query cache.

With the type information being provided, `Paginator` can take care of the necessary conversions by itself. This happens every time the Paginator is used.

The internal query that uses `WhereInWalker` can be cached again since it no longer has side effects.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jan 23, 2023
Make the `Paginator`-internal query (`... WHERE ... IN (id, id2, id3...)`) cacheable in the query cache again.

When the Paginator creates the internal subquery that does the actual result limiting, it has to take DBAL type conversions for the identifier column of the paginated root entity into account (doctrine#7820, fixed in doctrine#7821).

In order to perform this type conversion, we need to know the DBAL type class for the root entity's identifier, and we have to derive it from a given (arbitrary) DQL query. This requires DQL parsing and inspecting the AST, so doctrine#7821 placed the conversion code in the `WhereInWalker` where all the necessary information is available.

The problem is that type conversion has to happen every time the paginator is run, but the query that results from running `WhereInWalker` would be kept in the query cache. This was reported in doctrine#7837 and fixed by doctrine#7865, by making this particular query expire every time. The query must not be cached, since the necessary ID type conversion happens as a side-effect of running the `WhereInWalker`.

The Paginator internal query that uses `WhereInWalker` has its DQL re-parsed and transformed in every request. In addition to the processing overhead, this may also waste opcache memory (doctrine#9917).

This PR moves the code that determines the DBAL type out of `WhereInWalker` into a dedicated SQL walker class, `RootTypeWalker`.

`RootTypeWalker` uses a ~hack~  clever trick to report the type back: It sets the type as the resulting "SQL" string. The benefit is that `RootTypeWalker` results can be cached in the query cache themselves. Only the first time a given DQL query has to be paginated, we need to run this walker to find out the root entity's ID type. After that, the type will be returned from the query cache.

With the type information being provided, `Paginator` can take care of the necessary conversions by itself. This happens every time the Paginator is used.

The internal query that uses `WhereInWalker` can be cached again since it no longer has side effects.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jan 23, 2023
Make the `Paginator`-internal query (`... WHERE ... IN (id, id2, id3...)`) cacheable in the query cache again.

When the Paginator creates the internal subquery that does the actual result limiting, it has to take DBAL type conversions for the identifier column of the paginated root entity into account (doctrine#7820, fixed in doctrine#7821).

In order to perform this type conversion, we need to know the DBAL type class for the root entity's identifier, and we have to derive it from a given (arbitrary) DQL query. This requires DQL parsing and inspecting the AST, so doctrine#7821 placed the conversion code in the `WhereInWalker` where all the necessary information is available.

The problem is that type conversion has to happen every time the paginator is run, but the query that results from running `WhereInWalker` would be kept in the query cache. This was reported in doctrine#7837 and fixed by doctrine#7865, by making this particular query expire every time. The query must not be cached, since the necessary ID type conversion happens as a side-effect of running the `WhereInWalker`.

The Paginator internal query that uses `WhereInWalker` has its DQL re-parsed and transformed in every request. In addition to the processing overhead, this may also waste opcache memory (doctrine#9917).

This PR moves the code that determines the DBAL type out of `WhereInWalker` into a dedicated SQL walker class, `RootTypeWalker`.

`RootTypeWalker` uses a ~hack~  clever trick to report the type back: It sets the type as the resulting "SQL" string. The benefit is that `RootTypeWalker` results can be cached in the query cache themselves. Only the first time a given DQL query has to be paginated, we need to run this walker to find out the root entity's ID type. After that, the type will be returned from the query cache.

With the type information being provided, `Paginator` can take care of the necessary conversions by itself. This happens every time the Paginator is used.

The internal query that uses `WhereInWalker` can be cached again since it no longer has side effects.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Jan 23, 2023
Make the `Paginator`-internal query (`... WHERE ... IN (id, id2, id3...)`) cacheable in the query cache again.

When the Paginator creates the internal subquery that does the actual result limiting, it has to take DBAL type conversions for the identifier column of the paginated root entity into account (doctrine#7820, fixed in doctrine#7821).

In order to perform this type conversion, we need to know the DBAL type class for the root entity's `#[Id]`, and we have to figure it out based on a given (arbitrary) DQL query. This requires DQL parsing and inspecting the AST, so doctrine#7821 placed the conversion code in the `WhereInWalker` where all the necessary information is available.

The problem is that type conversion has to happen every time the paginator is run, but the query that results from running `WhereInWalker` would be kept in the query cache. This was reported in doctrine#7837 and fixed by doctrine#7865, by making this particular query expire every time. The query must not be cached, since the necessary ID type conversion happens as a side-effect of running the `WhereInWalker`.

The Paginator internal query that uses `WhereInWalker` has its DQL re-parsed and transformed in every request.

This PR moves the code that determines the DBAL type out of `WhereInWalker` into a dedicated SQL walker class, `RootTypeWalker`.

`RootTypeWalker` uses a ~hack~  clever trick to report the type back: It sets the type as the resulting "SQL" string. The benefit is that `RootTypeWalker` results can be cached in the query cache themselves. Only the first time a given DQL query has to be paginated, we need to run this walker to find out the root entity's ID type. After that, the type will be returned from the query cache.

With the type information being provided, `Paginator` can take care of the necessary conversions by itself. This happens every time the Paginator is used.

The internal query that uses `WhereInWalker` can be cached again since it no longer has side effects.
greg0ire pushed a commit to mpdude/doctrine2 that referenced this pull request Feb 5, 2023
Make the `Paginator`-internal query (`... WHERE ... IN (id, id2,
id3...)`) cacheable in the query cache again.

When the Paginator creates the internal subquery that does the actual
result limiting, it has to take DBAL type conversions for the identifier
column of the paginated root entity into account (doctrine#7820, fixed in
 doctrine#7821).

In order to perform this type conversion, we need to know the DBAL type
class for the root entity's `#[Id]`, and we have to figure it out based
on a given (arbitrary) DQL query. This requires DQL parsing and
inspecting the AST, so doctrine#7821 placed the conversion code in the
`WhereInWalker` where all the necessary information is available.

The problem is that type conversion has to happen every time the
paginator is run, but the query that results from running
`WhereInWalker` would be kept in the query cache. This was reported in
 doctrine#7837 and fixed by doctrine#7865, by making this particular query expire every
time. The query must not be cached, since the necessary ID type
conversion happens as a side-effect of running the `WhereInWalker`.

The Paginator internal query that uses `WhereInWalker` has its DQL
re-parsed and transformed in every request.

This PR moves the code that determines the DBAL type out of
`WhereInWalker` into a dedicated SQL walker class, `RootTypeWalker`.

`RootTypeWalker` uses a ~hack~  clever trick to report the type back: It
sets the type as the resulting "SQL" string. The benefit is that
`RootTypeWalker` results can be cached in the query cache themselves.
Only the first time a given DQL query has to be paginated, we need to
run this walker to find out the root entity's ID type. After that, the
type will be returned from the query cache.

With the type information being provided, `Paginator` can take care of
the necessary conversions by itself. This happens every time the
Paginator is used.

The internal query that uses `WhereInWalker` can be cached again since
it no longer has side effects.
Gwemox pushed a commit to Gwemox/orm that referenced this pull request Feb 10, 2023
Make the `Paginator`-internal query (`... WHERE ... IN (id, id2,
id3...)`) cacheable in the query cache again.

When the Paginator creates the internal subquery that does the actual
result limiting, it has to take DBAL type conversions for the identifier
column of the paginated root entity into account (doctrine#7820, fixed in
 doctrine#7821).

In order to perform this type conversion, we need to know the DBAL type
class for the root entity's `#[Id]`, and we have to figure it out based
on a given (arbitrary) DQL query. This requires DQL parsing and
inspecting the AST, so doctrine#7821 placed the conversion code in the
`WhereInWalker` where all the necessary information is available.

The problem is that type conversion has to happen every time the
paginator is run, but the query that results from running
`WhereInWalker` would be kept in the query cache. This was reported in
 doctrine#7837 and fixed by doctrine#7865, by making this particular query expire every
time. The query must not be cached, since the necessary ID type
conversion happens as a side-effect of running the `WhereInWalker`.

The Paginator internal query that uses `WhereInWalker` has its DQL
re-parsed and transformed in every request.

This PR moves the code that determines the DBAL type out of
`WhereInWalker` into a dedicated SQL walker class, `RootTypeWalker`.

`RootTypeWalker` uses a ~hack~  clever trick to report the type back: It
sets the type as the resulting "SQL" string. The benefit is that
`RootTypeWalker` results can be cached in the query cache themselves.
Only the first time a given DQL query has to be paginated, we need to
run this walker to find out the root entity's ID type. After that, the
type will be returned from the query cache.

With the type information being provided, `Paginator` can take care of
the necessary conversions by itself. This happens every time the
Paginator is used.

The internal query that uses `WhereInWalker` can be cached again since
it no longer has side effects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants