New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client side in-memory joins #320
base: main
Are you sure you want to change the base?
Conversation
Fixes #280 |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #320 +/- ##
==========================================
- Coverage 84.93% 84.41% -0.53%
==========================================
Files 37 38 +1
Lines 3406 3631 +225
==========================================
+ Hits 2893 3065 +172
- Misses 365 401 +36
- Partials 148 165 +17
☔ View full report in Codecov by Sentry. |
2c30323
to
ea15152
Compare
c02118c
to
94d8076
Compare
API: First, join object need to be created, which specifies collections and join condition: ``` join := tigris.GetJoin[LeftCollModel, RightCollModel](db, "{left field}", "{right field}", [options]) ``` This creates a join between LeftCollModel and RightCollModel on equality of `{left field}` to `{right field}`. Created object then can be used to issue one or multiple read requests: ``` it, err := join.Read(ctx, filter.Eq("Field1", 1)) ``` filter condition of read API is applied to the left table. Iterator then returns of the rows matching the condition along with the corresponding rows from the right table, which satisfies the join condition. var l LeftCollModel var r []*RightCollModel for it.Next(&l, &r) { fmt.Printf("l=%v r=%v\n", l, r) } By default the documents which doesn't have matching documents in the right table returned in the results. These results can be skipped by providing `&JoinOptions{Type: tigris.InnerJoin}` option to GetJoin API. It is not required for the left field values or right field values to be unique. The value of the array fields are matched as is by default, by using `&JoinOptions{ArrayUnroll: true}` option individual array items can be matched in the right table. Implementation details: First request is issued to the left table with filter provided to Read API. Result is read into memory and request is prepared for the right table. Which will have the following filter `filter.Or(filter.Eq("{right field}", {left field value fetched by left query}), ...)`. Result from the first query is put in the map with {left field} value as the key, while reading the result from second query we append it to the corresponding map bucket. So as merge is done in the memory joins should be used for relatively small result sets only.
The api and code look good. However, I'm not confident this should be in the client library. If we merge this in, we have to do the same for the ts and python libraries. And any other client library we support. The other concern is as you mentioned in the description that if the results exceed the users application memory it will OOM. This puts a lot of responsibility on the users to understand their data. For example, they could have a small data set that works great for months then suddenly the data set increases significantly and then their whole application crashes unexpectantly in production. A smaller nit is that because we are reading a lot from both collections it will also increase the users network costs. I would prefer we implemented something like this on the server side, we could look at zig-zag joins to reduce the memory footprint and make sure that this will scale smoothly. This puts compute and memory risk on our servers but that is why we are the database and we should look to handle it as best as we could. |
The idea is to provide simple API, which solves common use-cases. Server side joins have multiple issue:
Client-side in contrast:
|
I agree with @garrensmith that this is better suited on the server side. However, either way, we don't need this feature now. |
API:
First, join object need to be created, which specifies collections and
join condition:
This creates a join between LeftCollModel and RightCollModel on
equality of
{left field}
to{right field}
.Created object then can be used to issue one or multiple read requests:
filter condition of read API is applied to the left table.
Iterator then returns the rows matching the condition along with the
corresponding rows from the right table, which satisfies the join
condition.
By default the documents which doesn't have matching documents in the right
table returned in the results. These results can be skipped by providing
&JoinOptions{Type: tigris.InnerJoin}
option.It is not required for the left field values or right field values to be unique.
The value of the array fields are matched as is by default, by using
&JoinOptions{ArrayUnroll: true}
option individual array items can bematched in the right table.
Implementation details:
First request is issued to the left table with filter provided to Read API.
Result is read into memory and request is prepared for the right table.
Which will have the following filter
filter.Or(filter.Eq("user_id", {value of id field}), ....)
.Result from the first query is put in the map with ID as the key, while reading the result
from second query we append it to the corresponding map bucket.
So as merge is done in the memory joins should be used for relatively small
result sets only.