Added support for basic incremental index sorting #3656

llalexandru00 · 2022-10-14T12:58:44Z

Performance bottleneck

I am using a lot of multi-table simple queries with ORDER BY (and eventually OFFSET/LIMIT), which aren't optimized in any sense. All records are fetched, a final sorting is done and eventually an OFFSET/LIMIT is applied. There is an existing optimization in H2 based on the Select.sortUsingIndex flag, which hints that the records are already fetched in a sorted way (because an index is used), so a final sorting is not required. Also, LIMIT clause has a lot to benefit from this, as only some records are fetched. However, this works only for single-table queries.

Example

The most basic example is something like SELECT * FROM TEST1 CROSS JOIN TEST2 ORDER BY TEST1.F1 TEST2.F2 LIMIT 5. Right now, H2 sequentially scans both tables, sorts the result set according to ORDER BY TEST1.F1 TEST2.F2 and retrieves the first 5 rows. If both tables have ~1000 unique records, this operation takes ~450ms (embedded anonymous in-memory database v2.1.214).

Research

PostgreSQL (at least 14) has "INCREMENTAL SORTING" part of the planning phase. This improves the sorting at the end as the result set can be sorted in batches. Intuitively, the LIMIT is also optimized due to this technique.
DuckDB (at least 0.5.1) and SQLite (at least 3.39.3) also have differential performance for LIMIT in the context of multi-table ORDER BY (LIMIT 1 is considerably faster than LIMIT 100000 even for multi-table queries). Of course, these are C based back-ends, but the theoretic approach is relevant here.
Apache Derby (at least 10.4.2) has a a poor performance on cross joins, but still has a consistent improvement between LIMIT values. This may be misleading, as the final sorting performance difference may spoil the performance check.

Solution

I suggest using the best b-tree index for the topFilterTable - the one that matches the longest prefix of columns from the ORDER BY clause as possible. Using such index will ensure that the result set is partially sorted on the first Select.alreadySorted elements from ORDER BY.

When OFFSET/LIMIT is specified, only LIMIT elements shall be retrieved initially. To ensure completeness, extra records should be fetched until a "bigger" record is encountered based on the first Select.alreadySorted fields. Sorting these and applying LIMIT is a complete and sound solution.

Using something like LIMIT X will make SELECT to retrieve only X records and eventually fetches some extra records to ensure completeness.

Best case scenario: ORDER BY has many fields matched by an index or the records have a high variance. Very few extra records are fetched (maybe none). In this case, the number of extra comparisons is low, so the overhead is clearly acceptable comparing to the fetching of all records.
Worst case scenario: the best index matched only the first 1/2 fields from ORDER BY and these have a very low variance. Many extra records are fetched (maybe the whole result-set). In this case, the number of extra comparisons is high, but the number of fields for which one comparison is done is very low (1/2). Therefore, the overhead of applying this solution on result-sets with low variance is negligible.

Result

The same SQL I provided earlier now runs in ~6 ms. The change never compromises as this does not interfere with join order or index selection. Only when it is safe to use/upgrade an index, Select.alreadySorted is used.

In general, all CROSS JOIN queries with ORDER BY and LIMIT are working considerably faster.
Queries that don't use ORDER BY are not affected by the change.
Inner joins without index may use a sort index to aid ORDER BY and LIMIT
Inner joins with index may upgrade to a more complex index to also cover ORDER BY and LIMIT, but always preserves the invariant proposed by the index in the first place.

h2/src/main/org/h2/command/query/Select.java

katzyn

Thank you for your contribution!

Please, send a license statement as described here
https://h2database.com/html/build.html#providing_patches
to our mailing list:
https://groups.google.com/g/h2-database
(It is partially pre-moderated, some new messages will be visible only after approval.)

h2/src/main/org/h2/command/query/Select.java

…rt, fixed javadoc

h2/src/main/org/h2/command/query/Select.java

katzyn · 2022-10-18T13:08:00Z

Did you send a license statement to the mailing list? I don't see it (but maybe it wasn't approved yet).

llalexandru00 · 2022-10-19T08:14:44Z

The licensing perspective of my changes here is yet to be discussed. Sorry for the delay.

Added support for basic incremental index sorting

12a3dd3

sonatype-lift bot reviewed Oct 14, 2022

View reviewed changes

h2/src/main/org/h2/command/query/Select.java Show resolved Hide resolved

sonatype-lift bot reviewed Oct 14, 2022

View reviewed changes

h2/src/main/org/h2/command/query/Select.java Show resolved Hide resolved

sonatype-lift bot reviewed Oct 14, 2022

View reviewed changes

h2/src/main/org/h2/command/query/Select.java Outdated Show resolved Hide resolved

h2database deleted a comment from sonatype-lift bot Oct 15, 2022

katzyn reviewed Oct 15, 2022

View reviewed changes

h2/src/main/org/h2/command/query/Select.java Outdated Show resolved Hide resolved

h2/src/main/org/h2/command/query/Select.java Outdated Show resolved Hide resolved

h2/src/main/org/h2/command/query/Select.java Show resolved Hide resolved

Renamed alreadySorted to numIndexSortedColumns, removed wildcard impo…

1ae5a07

…rt, fixed javadoc

sonatype-lift bot reviewed Oct 16, 2022

View reviewed changes

h2/src/main/org/h2/command/query/Select.java Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for basic incremental index sorting #3656

Added support for basic incremental index sorting #3656

llalexandru00 commented Oct 14, 2022

katzyn left a comment

katzyn commented Oct 18, 2022

llalexandru00 commented Oct 19, 2022

Added support for basic incremental index sorting #3656

Are you sure you want to change the base?

Added support for basic incremental index sorting #3656

Conversation

llalexandru00 commented Oct 14, 2022

Performance bottleneck

Example

Research

Solution

Result

katzyn left a comment

Choose a reason for hiding this comment

katzyn commented Oct 18, 2022

llalexandru00 commented Oct 19, 2022