Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forced order by ID in sql.js causing 40+ second queries for single row queries. #169

Open
3 tasks
SippieCup opened this issue Feb 21, 2020 · 11 comments
Open
3 tasks

Comments

@SippieCup
Copy link

SippieCup commented Feb 21, 2020

On tables with 100 million+ rows, when querying for a single row using loopback it'll insert an "order by id" forcing the entire table read.

Steps to reproduce

I am using a postgres server, but it is for pretty much all sql queries.

Example:
We have a model "foo" attached to a SQL table with the columns id and hash, where id is unique, and hash is not unique. This table has 100,000,000 Rows. and some hashes have up to 100,000 rows. (note this is simplified, in my database there are far more columns and every row is unique)

We use this curl command to query the loopback API (I know that I can use findOne instead, but it'll still go toprototype.all so it really doesn't make a difference besides returning a single object rather than an array):

curl --location --request GET 'localhost:3000/api/foos' --header 'Content-Type: application/x-www-form-urlencoded' --form 'filter={"where": { "hash": "d41d8cd98f00b204e9800998ecf8427e" }, "limit": 1 }'

Current Behavior

Loopback generates the following SQL statement

SELECT * FROM "public"."foo" WHERE "hash"='d41d8cd98f00b204e9800998ecf8427e' ORDER BY "id" LIMIT 1

Because hash is non-unique, and can have thousands of rows, (in this case, 115,000) this takes 40.632s on my server to complete.

Expected Behavior

Loopback should instead generate the following SQL statement:

SELECT hash FROM "public"."foo" WHERE "hash"='d41d8cd98f00b204e9800998ecf8427e' LIMIT 1

resulting in a query time of on my server of 12ms.

Link to reproduction sandbox

If you really need me to setup a reproduction sandbox I can, but the problem should be pretty apparent.

Additional information

link to offending code: https://github.com/strongloop/loopback-connector/blob/master/lib/sql.js#L1388

You could say that this is "working as intended" and that its the database structure is bad/wrong/whatever, but there is no reason why the connector should be dictating an order on all queries.

This functionality should be defined in the default scope of the datasource and model, not built into the base loopback connector.

Acceptance Criteria

@SippieCup SippieCup added the bug label Feb 21, 2020
@SippieCup SippieCup changed the title Forced Order by ID in sql.js causing 40+ second queries for single row queries. Forced order by ID in sql.js causing 40+ second queries for single row queries. Feb 21, 2020
@agnes512
Copy link
Contributor

@SippieCup thanks for bringing up the issue. I agree that order is not needed for many use cases.

We currently have a community PR ongoing: loopbackio/loopback-connector-postgresql#417. I think it's trying to fix the issue you mentioned. It's proposing to have flag defaultIdSort to disable sorting on id. Feel free to join the discussion!

@SippieCup
Copy link
Author

SippieCup commented Feb 26, 2020

Thanks, didn't notice that PR before.

While its good for the postgresql connector, and acceptable for my use case, I still think it might be worth backporting it to the parent connector for other sql-based datasources.

@agnes512
Copy link
Contributor

@SippieCup Agreed. I will edit the issue description/acceptance criteria a bit and let the team to estimate it. Thanks!

@jannyHou
Copy link
Contributor

Problem: If order field is not specified, pk will be the default order field in the generated query
Solution:

  • try some quick workaround, e.g. order by 1 or order by '', see if it fixes the problem
  • then we can think of introducing a flag to avoid automatically use the pk when order field is not specified

@SippieCup
Copy link
Author

order by 1 (or whichever column you would like), would only give an improvement if you are selecting the column used in the where filter, at which point, it is just redundant versus removing the order by clause entirely. Furthermore, If the user is not selecting fields to return, they would have in know the order of the columns as they appear in the database and select the right column (while also handling hidden properties / columns potentially not defined in the loopback model) to figure out the correct column number to select.

It would would make code unreadable, breaking on model/database changes, and would be adding another magnitude of complexity. But I guess it does "work." If thats the solution strongloop chooses, I'll just continue using my forked connector where I just removed the offending code entirely, but I'd feel sorry for other users.

order by '' I'm fairly certain will give you an error as its would be a non-integer constant.

I see two functional solutions: Adding a flag as you have stated and in the PR, or removing the order by id default, and have users explicitly declare if they want order by id within the model scope.

@stale
Copy link

stale bot commented Apr 28, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 28, 2020
@agnes512 agnes512 removed the stale label Apr 28, 2020
@stale
Copy link

stale bot commented Jun 28, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 28, 2020
@stale
Copy link

stale bot commented Jul 12, 2020

This issue has been closed due to continued inactivity. Thank you for your understanding. If you believe this to be in error, please contact one of the code owners, listed in the CODEOWNERS file at the top-level of this repository.

@stale stale bot closed this as completed Jul 12, 2020
@bajtos
Copy link
Member

bajtos commented Sep 11, 2020

I see two functional solutions: Adding a flag as you have stated and in the PR, or removing the order by id default, and have users explicitly declare if they want order by id within the model scope.

I like the second solution more, it seems more elegant and easier to implement to me. However, I am concerned about backwards compatibility - changing the default sort order is likely to affect all existing LoopBack applications out there.

I think we need to implement a feature flag to allow users to keep the old behavior. This feature flag should be initially enabled by default. After some time, we can deprecate it (start printing deprecating warnings for users still using it) and after even more time we can eventually remove it.

I think defaultIdSort is exactly the kind of a feature flag we want.

@SippieCup would you like to take a look at how we can port loopbackio/loopback-connector-postgresql#417 to loopback-connector and contribute the changes to enable this feature for all SQL connectors?

@SippieCup
Copy link
Author

Sorry for the late response, Didn't notice this until now due to the extreme workload I have due to covid. I can take a look at it next week and see about upstreaming the changes in a more usable way for connectors. We ended up just removing the code from the base connector.

Now that You guys have decided how to implement it, I can see about building this solution into it and create a pull request.

@bajtos
Copy link
Member

bajtos commented Sep 29, 2020

@SippieCup awesome, I am looking forward to review your pull request(s). I have assigned the issue to you, so that others know somebody is already working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants