New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for basic incremental index sorting #3656
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
Please, send a license statement as described here
https://h2database.com/html/build.html#providing_patches
to our mailing list:
https://groups.google.com/g/h2-database
(It is partially pre-moderated, some new messages will be visible only after approval.)
…rt, fixed javadoc
Did you send a license statement to the mailing list? I don't see it (but maybe it wasn't approved yet). |
The licensing perspective of my changes here is yet to be discussed. Sorry for the delay. |
Performance bottleneck
I am using a lot of multi-table simple queries with
ORDER BY
(and eventuallyOFFSET/LIMIT
), which aren't optimized in any sense. All records are fetched, a final sorting is done and eventually anOFFSET/LIMIT
is applied. There is an existing optimization in H2 based on theSelect.sortUsingIndex
flag, which hints that the records are already fetched in a sorted way (because an index is used), so a final sorting is not required. Also, LIMIT clause has a lot to benefit from this, as only some records are fetched. However, this works only for single-table queries.Example
The most basic example is something like
SELECT * FROM TEST1 CROSS JOIN TEST2 ORDER BY TEST1.F1 TEST2.F2 LIMIT 5
. Right now, H2 sequentially scans both tables, sorts the result set according toORDER BY TEST1.F1 TEST2.F2
and retrieves the first 5 rows. If both tables have ~1000 unique records, this operation takes ~450ms (embedded anonymous in-memory database v2.1.214).Research
LIMIT
is also optimized due to this technique.LIMIT
in the context of multi-tableORDER BY
(LIMIT 1
is considerably faster thanLIMIT 100000
even for multi-table queries). Of course, these are C based back-ends, but the theoretic approach is relevant here.LIMIT
values. This may be misleading, as the final sorting performance difference may spoil the performance check.Solution
I suggest using the best b-tree index for the
topFilterTable
- the one that matches the longest prefix of columns from theORDER BY
clause as possible. Using such index will ensure that the result set is partially sorted on the firstSelect.alreadySorted
elements fromORDER BY
.When
OFFSET/LIMIT
is specified, onlyLIMIT
elements shall be retrieved initially. To ensure completeness, extra records should be fetched until a "bigger" record is encountered based on the firstSelect.alreadySorted
fields. Sorting these and applyingLIMIT
is a complete and sound solution.Using something like
LIMIT X
will makeSELECT
to retrieve onlyX
records and eventually fetches some extra records to ensure completeness.ORDER BY
has many fields matched by an index or the records have a high variance. Very few extra records are fetched (maybe none). In this case, the number of extra comparisons is low, so the overhead is clearly acceptable comparing to the fetching of all records.ORDER BY
and these have a very low variance. Many extra records are fetched (maybe the whole result-set). In this case, the number of extra comparisons is high, but the number of fields for which one comparison is done is very low (1/2). Therefore, the overhead of applying this solution on result-sets with low variance is negligible.Result
The same SQL I provided earlier now runs in ~6 ms. The change never compromises as this does not interfere with join order or index selection. Only when it is safe to use/upgrade an index,
Select.alreadySorted
is used.CROSS JOIN
queries withORDER BY
andLIMIT
are working considerably faster.ORDER BY
are not affected by the change.ORDER BY
andLIMIT
ORDER BY
andLIMIT
, but always preserves the invariant proposed by the index in the first place.