Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.1 unbuffered queries #17649

Merged
merged 13 commits into from May 11, 2024
Merged

5.1 unbuffered queries #17649

merged 13 commits into from May 11, 2024

Conversation

ADmad
Copy link
Member

@ADmad ADmad commented Apr 1, 2024

Refs #17517

I have been playing around to add back support for unbuffered queries and have come up with a POC. The implementation is a bit different than earlier since we got rid of statement decoration in 5.0.

Digging into the topic I learned that at PDO level postgres is always buffered, sqlite is always unbuffered, only mysql has the option to toggle buffered/unbuffered at connection level.

Here are a few noteworthy points about the implementation:

  • At the ORM level for non joined association a separate query is made to fetch related records of each primary record. So for Posts hasMany Comments if a query returns 2 posts then there will be one query each to fetch related Comments unlike for buffered results where all related comments are fetched using a single query.
  • In unbuffered mode the Statement instance is iterated which returns a generator. Since generators can't be rewind, trying to iterate the resultset again or using any collection method which internally iterates will throw an exception. Personally I think that's a reasonable limitation. An alternative would be to try and transparently re-execute the query.
  • Trying to make Statement::getIterator() directly return the PDOStatment instance failed as the results are later wrapped in collection for decoration and Collection::unwrap() has return type Iterator instead of Traversable.
  • For simplicity I have made ResultSet simply extend Collection.

TODO:

  • Fix the one failing test
  • Lot of additional tests required

Unbufferred results are only available when directly iterating
the SelectQuery instance. Calling SelectQuery::all() with always
returned a buffered result set.
@ADmad ADmad added this to the 5.1.0 milestone Apr 1, 2024
}

$this->closeCursor();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see how this and fetchAll() should co-exist. Won't the statement be empty if we call one or the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One needs to pick one of the options, either call fetchAll() or iterate the instance.

@ADmad
Copy link
Member Author

ADmad commented Apr 2, 2024

Hmm.. an ORM test which uses unbuffered queries and contains a hasMany assoc. (which cause a new query) fails on mysql with

SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active.  Consider using PDOStatement::fetchAll().  Alternatively, if your code is only ever going to run against mysql, you may enable query buffering by setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute.

I tried the same on 4.x and there's no error. Trying to walk through the 4.x code I haven't been able to figure out why the same error doesn't occur on 4.x.

src/Database/Query/SelectQuery.php Show resolved Hide resolved
src/ORM/EagerLoader.php Outdated Show resolved Hide resolved
Comment on lines 717 to 718
$results = (new Collection($results))
->map(function ($row) use ($query, $external, $assocData) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting solution.

src/ORM/EagerLoader.php Outdated Show resolved Hide resolved
src/ORM/ResultSetFactory.php Outdated Show resolved Hide resolved
@ADmad
Copy link
Member Author

ADmad commented Apr 7, 2024

I tried the same on 4.x and there's no error. Trying to walk through the 4.x code I haven't been able to figure out why the same error doesn't occur on 4.x.

I figured out why the error doesn't occur on 4.x. Even if you have bufferring turned off, if you contain hasMany/belongsToMany associations the eagerloader will implicitly start buffering by using the BufferedStatement.

if (!($statement instanceof BufferedStatement)) {

Since new queries can't be executed for the same connection for unbuffered MySQL
queries, containing externally loaded associations will implicitly buffer results.
@othercorey
Copy link
Member

I figured out why the error doesn't occur on 4.x. Even if you have bufferring turned off, if you contain hasMany/belongsToMany associations the eagerloader will implicitly start buffering by using the BufferedStatement.

That sounds right.

Fix errors reported by static analysis
@ADmad
Copy link
Member Author

ADmad commented Apr 8, 2024

This is ready for another round of review.

If associations which require additional queries are contained then even for unbuffered queries all results will be pre-fetched since MySQL doesn't allow running new queries when an unbuffered query is already active.

I'll work on adding more tests after this.

@ADmad
Copy link
Member Author

ADmad commented Apr 9, 2024

@nicosp Can you please try out this branch and provide feedback whether it restores support for your use case?

@nicosp
Copy link
Contributor

nicosp commented Apr 9, 2024

@nicosp Can you please try out this branch and provide feedback whether it restores support for your use case?

Sure. I will try it in a couple of days. Thank you for attempting to fix this


if ($query instanceof SelectQuery) {
try {
$this->getPdo()->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, $query->isBufferedResultsEnabled());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we did the the same in 4.x, so this should be ok. I'm wondering if setting the attribute per query is necessary or affects performance at all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's the same as 4.x, no idea about the performance implications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not overcomplicate things. If this worked in 4.x it will be good enough here too. If it creates a performance regression we can patch around it.

@@ -30,184 +28,8 @@
* @template T of \Cake\Datasource\EntityInterface|array
* @implements \Cake\Datasource\ResultSetInterface<T>
*/
class ResultSet implements ResultSetInterface
class ResultSet extends Collection implements ResultSetInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of extending Collection directly here. I'm interested in making Collection extensible and more customizable with collection iterators instead of CollectionInterface.

Not sure what ResultSetInterface means in the future, but I think this is ok for 5.1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what ResultSetInterface means in the future, but I think this is ok for 5.1.

ResultSetInterface doesn't really hold much meaning. Maybe in the future we can just drop it and make ORM\Query\SelectQuery:all()'s return type as Collection/CollectionInterface.

$results = (new Collection($results))
->map(function ($row) use ($data) {
return $this->groupResult($row, $data);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully users don't iterator over the result set multiple times. Should we unwrap this before passing to ResultSet()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the non-array path the unbuffered statement so we can't unwrap this? Should we check it's a statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes a non array path means it's an buffered statement. But since the Collection can handle any iterable, I don't want to narrow this down unnecessarily using a Statement check.

@@ -111,13 +119,6 @@ public function all(): iterable
{
if ($this->_results === null || $this->_dirty) {
$this->_results = $this->execute()->fetchAll(StatementInterface::FETCH_TYPE_ASSOC);
if ($this->_resultDecorators) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to the statement seems good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Statement was already decorating the records using the typemap so moving user provided decorators there too makes sense, so all decorations are now in one place.

@nicosp
Copy link
Contributor

nicosp commented Apr 15, 2024

@nicosp Can you please try out this branch and provide feedback whether it restores support for your use case?

With light testing on a preminary 5.x branch it's working for me. Let me know if there is anything more I can do to help

Copy link
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👏


if ($query instanceof SelectQuery) {
try {
$this->getPdo()->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, $query->isBufferedResultsEnabled());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not overcomplicate things. If this worked in 4.x it will be good enough here too. If it creates a performance regression we can patch around it.

src/Database/Query/SelectQuery.php Outdated Show resolved Hide resolved
Co-authored-by: Mark Story <mark@mark-story.com>
@ADmad ADmad marked this pull request as ready for review April 26, 2024 11:53
@ADmad
Copy link
Member Author

ADmad commented Apr 26, 2024

I was hoping to add some more tests but have been quite busy and won't be able to get to it anytime soon.

So if we are happy with the changes this can be merged.

@othercorey
Copy link
Member

I was hoping to add some more tests but have been quite busy and won't be able to get to it anytime soon.

So if we are happy with the changes this can be merged.

Do you have an idea what kind of tests you wanted?

@ADmad
Copy link
Member Author

ADmad commented Apr 26, 2024

Do you have an idea what kind of tests you wanted?

Haven't got around to even that yet :). It's just that there's currently only 1 ORM test with results buffering turned off. Would be nice a few tests at DBAL and ORM levels to check that actually only 1 record is fetched at a time with buffering disabled.

Also have ORM tests which fetch hasmany/belongstomany associations with buffering turned off, to show we get the expected results (albeit with buffering implicitly turned off).

@othercorey
Copy link
Member

Let's leave the PR open for a bit to see if one of us finds time to confirm no issues.

@markstory markstory merged commit 6ea9621 into 5.next May 11, 2024
9 of 10 checks passed
@markstory markstory deleted the 5.1-unbuffered-queries branch May 11, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants