Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(panic) Preloading does not work with many values due to using IN #283

Closed
aligator opened this issue May 23, 2022 · 6 comments · Fixed by #285
Closed

(panic) Preloading does not work with many values due to using IN #283

aligator opened this issue May 23, 2022 · 6 comments · Fixed by #285
Labels
bug Something isn't working

Comments

@aligator
Copy link
Contributor

aligator commented May 23, 2022

hi,

I noticed a problem with preloading due to utilizing IN-statements.
(maybe related: #104)

The problem

If you preload an entity and it results in more than 999 (on mariadb) ids in the preload-select, it will fail with a rather not explaining panic:

panic: rel: primary key row does not exists

goroutine 1 [running]:
github.com/go-rel/rel.scanMulti({0x701830?, 0xc000010720}, {0xc0001da060, 0x7}, {0x7025a0, 0x64be80}, 0x0?)
        /home/johannes/go/pkg/mod/github.com/go-rel/rel@v0.34.0/cursor.go:84 +0x48d
github.com/go-rel/rel.repository.preload({{0x701f80?, 0xc0001a5720?}, 0x6b6d90?}, {{0x7012d0?, 0xc000022120?}, {0x701f80?, 0xc0001a5720?}}, {0x7017b0, 0xc0001d6100}, {0xc000147f88, ...}, ...)
        /home/johannes/go/pkg/mod/github.com/go-rel/rel@v0.34.0/repository.go:1051 +0x6ba
github.com/go-rel/rel.repository.findAll({{_, _}, _}, {{_, _}, {_, _}}, _, {0x0, {0xc0001da085, ...}, ...})
        /home/johannes/go/pkg/mod/github.com/go-rel/rel@v0.34.0/repository.go:298 +0x3bf
github.com/go-rel/rel.repository.FindAll({{0x701f80?, 0xc0001a5720?}, 0x6b6d90?}, {0x7012d0, 0xc000022120}, {0x642540, 0xc00000eba0}, {0xc000199040, 0x1, 0x1})
        /home/johannes/go/pkg/mod/github.com/go-rel/rel@v0.34.0/repository.go:276 +0x298
rel-in/repository.userRepository.FindAll({{0x7026b0?, 0xc00000ea08?}}, 0xc000022128?)
        /home/johannes/projects/rel-in/repository/user.go:32 +0x21d
main.main()
        /home/johannes/projects/rel-in/main.go:48 +0x256

To help you to understand and debug the problem, I created a demo-repo which triggers exactly this problem:
https://github.com/aligator/rel-in

background knowledge

Many (if not most?) databases do not allow IN-Statements with many values.
e.g. https://stackoverflow.com/questions/8650324/how-many-values-in-an-in-clause-is-too-many-in-a-sql-query

There are some different ways to use IN which do not have these limitations. (may be different in some dbms)

  • you can use a subquery in IN, which does not have a limit on the result-rows of that subselect
  • in some dbms you can use tuples, (however that's a dirty workaround)
  • you can split the values and just use several IN statements (also a somewhat dirty workaround)

After all an IN-statement is never the best solution if you have many values. It is also not always more performant than just doing a join. (dbms may differ, as they may optimize the queries in different ways)

--> The way preloading is implemented currently is neither (most of the time) efficient nor does it work in all cases.

Possible solutions

fast and easy, dirty fix

just split the ids and concatenate multiple IN-statements with an OR (based on the amount of max-values the adapters should provide as this is dbms specific)

clean solution

There are some problems with the dirty-fix:

  • As I said, the IN statement wont be very performant at all if you have many ids (eg. 10000, 1000000, ...).
  • Databases do very much optimizations and cachings. But such an IN-statement is not very good cacheable (AFAIK)
  • In contrast to this it should be very good for the database to just do the same select as shortly before again.

I am not an expert with dbms performance, so I cannot say if a normal JOIN or an IN with a subselect is better. (That may also differe for the dbms.)
I think an IN statement with the same select as the main-select is more easy to implement, however I also think that the databases are much more optimized for normal JOINs as they are built with them in mind.

Actually for the auto-preloading JOINS should be much better as you save an extra db-call. (if not stacked too deep)

Maybe it is a good Idea to observe how well-known ORMs do the joins and preloading before implementing a clean fix.

I hope I could explain the issue good enough.
What are your thoughts on this?


some links about IN statements (note: some of them are quite old and some dbms may optimize it, but the basics should still apply)
https://stackoverflow.com/questions/6219501/is-a-long-in-clause-a-code-smell
https://www.postgresql.org/message-id/1178821226.6034.63.camel@goldbach
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:778625947169

@aligator aligator changed the title Preloading does not work with many values due to using IN (panic) Preloading does not work with many values due to using IN May 23, 2022
@Fs02
Copy link
Member

Fs02 commented May 24, 2022

hi, thank you for submitting this issue

Actually for the auto-preloading JOINS should be much better as you save an extra db-call. (if not stacked too deep)

I'm not an expert as well, but I think JOINS query is not a simple operation in database compared to WHERE IN. even though using JOIN saves extra db-call, internally database need to do operation to merge the two or more tables. that's why I think using WHERE IN which is basically moves the JOIN operation to application layer is better, because it reduce the load of database. because database is hard to scale, I think making db load as light as possible should be preferred in most cases

Maybe it is a good Idea to observe how well-known ORMs do the joins and preloading before implementing a clean fix.

as for other ORM, the Preload function in REL is inspired from ActiveRecord (which also produce two queries), and other ORM such as GORM also does the same.
https://www.bigbinary.com/blog/preload-vs-eager-load-vs-joins-vs-includes

I think loading using JOIN should have different function, maybe called Include like in active record

just split the ids and concatenate multiple IN-statements with an OR (based on the amount of max-values the adapters should provide as this is dbms specific)

another solution I think is just to split it into multiple queries where the ids is large 🤔

@aligator
Copy link
Contributor Author

I think it is very different between the dbms based on how much they cache, if the select is anyway transformed into ORs and so on...
So its hard to have a general rule for that.

Also if multiple selects are better than a single IN() OR IN(), I don't know.
Both ways should be easy to implement as a first fix. (I can do a PR)

Another thing that bothers me, is that it does a hard panic and not just return an error explaining that the preload did not work. It was a bit hard to find the cause.

However I think I will investigate other ORMS a bit in this regard. Maybe I find some interesting techniques.

@Fs02
Copy link
Member

Fs02 commented May 24, 2022

Also if multiple selects are better than a single IN() OR IN(), I don't know.

maybe need to check other ORM first about whether they split it to multiple SELECT IN or SELECT IN OR IN

Both ways should be easy to implement as a first fix. (I can do a PR)

Thank you, let me know if you need any help

@Fs02 Fs02 added the bug Something isn't working label May 24, 2022
@aligator
Copy link
Contributor Author

I found this:
rails/rails#585
rsim/oracle-enhanced#120 (comment)

The oracle-enhanced adapter for rails uses separate selects:
https://github.com/rsim/oracle-enhanced/blob/0f503e4d090875b87e7d1c492a84f99b85acb9e6/lib/arel/visitors/oracle.rb#L96

I think for now I just implement the multiple SELECT way

@aligator
Copy link
Contributor Author

aligator commented May 24, 2022

Note: In Gorm I could not find any place where that case is handled. I think I will just have to build a test-application for gorm as well to understand what it does...

Edit: now thats interesting:
I couldn't find something, because there is nothing^^

It seems that mariadb just silently returns no rows.

  • In rel this leads to the panic.
  • In gorm this just leads to silently not preloading anything...

What do you think about this?

@Fs02
Copy link
Member

Fs02 commented May 24, 2022

agree, lets go with multiple select 👍
(we can use lower chunks number supported by popular database)

In rel this leads to the panic.

somehow making unexpected edge case panic paid of LOL

aligator added a commit to aligator/rel that referenced this issue May 27, 2022
aligator added a commit to aligator/rel that referenced this issue May 27, 2022
aligator added a commit to aligator/rel that referenced this issue May 27, 2022
@Fs02 Fs02 closed this as completed in #285 May 27, 2022
Fs02 pushed a commit that referenced this issue May 27, 2022
* Split preloading-IN query into multiple queries (#283)

* Cleanup code, pass err to scanFinish (#283)

* Add test for calling scanMulti multiple times with the same cols (#283)

* Add test for Preload (#283)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants