Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing unbuffered queries for ORM SelectQuery #17517

Open
nicosp opened this issue Jan 2, 2024 · 7 comments
Open

Consider allowing unbuffered queries for ORM SelectQuery #17517

nicosp opened this issue Jan 2, 2024 · 7 comments

Comments

@nicosp
Copy link
Contributor

nicosp commented Jan 2, 2024

Description

The main blocker we have to move to 5.x is the use of unbuffered queries. Our reports are created in the following way:

  1. Run the "main" report on the "reports" connection which always has buffering disabled. This allows us to fetch any number of rows because they are not loaded in memory.
  2. For each record we use the main connection which has buffering enabled to fetch any associations and format the entities.
  3. Write the formatted entity in the report.
  4. Move to the next result in the "reports" connection and repeat.

We could not find a replacement for 5.x. Note: We cannot simply use PDO for 1 because we rely on the ORM for access control and other facilities.

CakePHP Version

5.0

@othercorey othercorey added the ORM label Jan 2, 2024
@othercorey othercorey added this to the 5.1.0 milestone Jan 2, 2024
@ADmad
Copy link
Member

ADmad commented Jan 9, 2024

You could use Connection::run() to run the query and get a StatementInterface instance which you can iterate over, or just use pagination to fetch reports in chunks.

@nicosp
Copy link
Contributor Author

nicosp commented Jan 9, 2024

@ADmad The problem is that we rely on the cake ORM for some facilities and we would need to re-implement them on PDO. We cannot use pagination because it creates gaps or missing records because of record creation/deletions while the report is running.

@othercorey
Copy link
Member

Why can't you buffer the results from the first query?

@nicosp
Copy link
Contributor Author

nicosp commented Jan 11, 2024

Why can't you buffer the results from the first query?

Memory usage is too much to run the first query with buffering enabled (before we even get to the ORM). We create huge xml files from thousands of records. All workers are limited to 128M which is not enough for our bigger customers. So we have to use an unbuffered ORM query to fetch record by record consistently and use a "normal" buffered query to "decorate" each record with associations and related data.

@othercorey
Copy link
Member

So you worker is batching jobs by pulling N records from this first query then building the rest of the data just for that set and then releasing the data before the next batch?

Is it not possible to just pull the records you need each batch?

@nicosp
Copy link
Contributor Author

nicosp commented Jan 26, 2024

@othercorey Let me explain it better:

  1. The batch worker is fetching ALL records in a single ORM unbuffered query from the "report" connection. We cannot use any variant of fetch N records (paging etc) because we have new inserts while this long running query is executing. It's critical that we don't skip any records because of this. It's not just theoretical it has happened. We cannot use a buffered query here because the number of records is very high and our rows are quite large.

  2. For each record we are using buffered ORM queries from the default connection to fetch associations etc.

  3. We write the "full" record data in our XML feed and move to the next record in the "report" connection query.

Hope this makes it clear.

@othercorey
Copy link
Member

FYI - we are considering this, but don't have an immediate solution.

@ADmad ADmad mentioned this issue Apr 1, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants