Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving search by compound indexes. #3915

Merged
merged 12 commits into from Nov 27, 2023

Conversation

kiss034
Copy link
Contributor

@kiss034 kiss034 commented Oct 18, 2023

Hi,

As I can see, searching by compound indexes does not work properly.

I created table with some data inside to test the query execution. See: https://github.com/kiss034/h2database/blob/test/sql/company.sql

I tested with the following SQL:

EXPLAIN ANALYZE SELECT mod_flag, mod_user FROM company WHERE (mod_flag, mod_user) IN (('I', 'user3'), ('U', 'user9'));

Before my changes, the engine was able to find the matching compound index while preparing, but later it was deleted from the index conditions, so a full table scan was executed.

SELECT
    "MOD_FLAG",
    "MOD_USER"
FROM "PUBLIC"."COMPANY"
    /* PUBLIC.IDX_MOD_FLAG_MOD_USER_MULTI: MOD_FLAG IN('I', 'U')
        AND MOD_USER IN('user3', 'user9')
     */
    /* scanCount: 178 */
WHERE ROW ("MOD_FLAG", "MOD_USER") IN(ROW ('I', 'user3'), ROW ('U', 'user9'))

I prepared the IndexCondition and IndexCursor classes to deal with multiple columns. After these changes the indexed search was executed properly.

SELECT
    "MOD_FLAG",
    "MOD_USER"
FROM "PUBLIC"."COMPANY"
    /* PUBLIC.IDX_MULTI_COLUMN_IN_TEST:  IN(ROW ('I', 'user3'), ROW ('U', 'user9')) */
    /* scanCount: 3 */
WHERE ROW ("MOD_FLAG", "MOD_USER") IN(ROW ('I', 'user3'), ROW ('U', 'user9'))

As I can see, the previous implementation used only the first indexed component of the compound IN condition. I do not know whether it was intentional. My implementation does not support single indexes in such queries.

Furthermore, currently the following test are fail.

ERROR: org/h2/test/scripts/queries/query-optimisations.sql
line: 275
exp: >> SELECT "PUBLIC"."T1"."A", "PUBLIC"."T1"."B", "T2"."A", "T2"."B" FROM "PUBLIC"."T1"      /* PUBLIC.T1_A_IDX: A IN(1, 2) */ INNER JOIN "PUBLIC"."T1" "T2" /* PUBLIC.T1.tableScan */ ON 1=1 WHERE ROW ("T1"."A", "T1"."B") IN(ROW (1, "T2"."A"), ROW (2, "T2"."B"))
got: >> SELECT "PUBLIC"."T1"."A", "PUBLIC"."T1"."B", "T2"."A", "T2"."B" FROM "PUBLIC"."T1" "T2" /* PUBLIC.T1.tableScan */ INNER JOIN "PUBLIC"."T1" /* PUBLIC.T1_A_IDX */ ON 1=1 WHERE ROW ("T1"."A", "T1"."B") IN(ROW (1, "T2"."A"), ROW (2, "T2"."B"))
------------------------------
ERROR: org/h2/test/scripts/queries/query-optimisations.sql
line: 279
exp: >> SELECT "PUBLIC"."T1"."A", "PUBLIC"."T1"."B" FROM "PUBLIC"."T1" /* PUBLIC.T1_A_IDX: A IN(1, 3) */ WHERE ROW ("A", "B") IN(ROW (1, 2), ROW (3, 4))
got: >> SELECT "PUBLIC"."T1"."A", "PUBLIC"."T1"."B" FROM "PUBLIC"."T1" /* PUBLIC.T1_A_IDX */ WHERE ROW ("A", "B") IN(ROW (1, 2), ROW (3, 4))
------------------------------

Could you please let me know if my implementation or the test should be fixed? Thanks.

Best regards
János Áron Kiss

Copy link
Contributor

@katzyn katzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

For the following table:

CREATE TABLE TEST(A INTEGER, B INTEGER, C INTEGER) AS
SELECT A, B, C FROM (VALUES 1, 2) T1(A) JOIN (VALUES 3, 4) T2(B) JOIN (VALUES 5, 6) T3(C);

Index conditions aren't listed in execution plan any more, so something is wrong with them:

CREATE INDEX TEST_A_IDX ON TEST(A);

EXPLAIN SELECT * FROM TEST WHERE (A, B) IN ((1, 3), (2, 4));

was: SELECT "PUBLIC"."TEST"."A", "PUBLIC"."TEST"."B", "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST" /* PUBLIC.TEST_A_IDX: A IN(1, 2) */
WHERE ROW ("A", "B") IN(ROW (1, 3), ROW (2, 4))

now: SELECT "PUBLIC"."TEST"."A", "PUBLIC"."TEST"."B", "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST" /* PUBLIC.TEST_A_IDX */
WHERE ROW ("A", "B") IN(ROW (1, 3), ROW (2, 4))

The same problem here:

DROP INDEX TEST_A_IDX;
CREATE INDEX TEST_B_IDX ON TEST(B);

EXPLAIN SELECT * FROM TEST WHERE (A, B) IN ((1, 3), (2, 4));

was: SELECT "PUBLIC"."TEST"."A", "PUBLIC"."TEST"."B", "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST" /* PUBLIC.TEST_B_IDX: B IN(3, 4) */
WHERE ROW ("A", "B") IN(ROW (1, 3), ROW (2, 4))

now: SELECT "PUBLIC"."TEST"."A", "PUBLIC"."TEST"."B", "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST" /* PUBLIC.TEST_B_IDX */
WHERE ROW ("A", "B") IN(ROW (1, 3), ROW (2, 4))

And here

CREATE INDEX TEST_C_A_IDX ON TEST(C, A);

EXPLAIN SELECT * FROM TEST WHERE (A, B) IN ((1, 3), (2, 4));

was: SELECT "PUBLIC"."TEST"."A", "PUBLIC"."TEST"."B", "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST" /* PUBLIC.TEST_B_IDX: B IN(3, 4) */
WHERE ROW ("A", "B") IN(ROW (1, 3), ROW (2, 4))

now: SELECT "PUBLIC"."TEST"."A", "PUBLIC"."TEST"."B", "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST" /* PUBLIC.TEST_B_IDX */
WHERE ROW ("A", "B") IN(ROW (1, 3), ROW (2, 4))

Here they are listed with a wrong order of values, it must be (5, 1), (6, 2)):

EXPLAIN SELECT * FROM TEST WHERE (A, C) IN ((1, 5), (2, 6));

was: SELECT "PUBLIC"."TEST"."A", "PUBLIC"."TEST"."B", "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST" /* PUBLIC.TEST_C_A_IDX: A IN(1, 2) AND C IN(5, 6) */
WHERE ROW ("A", "C") IN(ROW (1, 5), ROW (2, 6))

now: SELECT "PUBLIC"."TEST"."A", "PUBLIC"."TEST"."B", "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST" /* PUBLIC.TEST_C_A_IDX: IN(ROW (1, 5), ROW (2, 6)) */
WHERE ROW ("A", "C") IN(ROW (1, 5), ROW (2, 6))

There problems need to be resolved and I think we need more tests for these conditions.

* Contains a {@link Column} or {@code Column[]} depending on the condition type.
* @see #isCompoundColumns()
*/
private final Object column;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better to use different fields here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/
private IndexCondition(int compareType, ExpressionList columns, Expression expression) {
this.compareType = compareType;
if (columns == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use this code style:

if (…) {
    …
} else {
    …
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

*
* @see Column#convert(CastDataProvider, Value)
*/
public ValueRow convert(CastDataProvider provider, Column[] columns) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to move this code to some other place. Value classes may not depend on Column and other database objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a static method in the Column class instead.

return column;
if (column instanceof Column)
return (Column) column;
throw new IllegalStateException("The getColumn() method cannot be with multiple columns.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something goes horribly wrong, use throw DbException.getInternalError().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -189,6 +209,26 @@ private boolean canUseIndexFor(Column column) {
return idxCol == null || idxCol.column == column;
}

private boolean canUseIndexForIn(Column[] columns) {
if ( inColumn != null ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(inColumn != null) (without spaces inside parentheses).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@katzyn
Copy link
Contributor

katzyn commented Oct 18, 2023

My implementation does not support single indexes in such queries.

It must support them in absence of a better index, because we can't accept a PR with this regression.

Actually your implementation still somehow chooses a proper index, but TableFilter.indexConditions is empty.

@kiss034
Copy link
Contributor Author

kiss034 commented Oct 30, 2023

My implementation does not support single indexes in such queries.

It must support them in absence of a better index, because we can't accept a PR with this regression.

Actually your implementation still somehow chooses a proper index, but TableFilter.indexConditions is empty.

Hi @katzyn,

Thank you for your feedback.

I put back the previous createIndexConditions(TableFilter, ExpressionList) method call into the ConditionIn AND ConditionInConstantSet classes. Now, we create a compound index condition (containing every item in the expression list) and unique index conditions (for every item in the expression list) too. Later, the TableFilter#prepare() will drop those conditions that cannot be used for the search.

I adjusted some tests to make them pass the new expectations. I need to fix the regular compound comparisons too (e.g: (A, B) = (1, 2)), so I put some TODOs in the test cases.

Please let me know if you have any suggestions. Thanks.

@kiss034 kiss034 requested a review from katzyn October 30, 2023 13:12
@kiss034
Copy link
Contributor Author

kiss034 commented Nov 15, 2023

Hi @katzyn,

I hope I was able to solve every issue. I have run the tests, and it looks OK to me. Could you please review my pull request again? Thanks.

ExpressionVisitor visitor = ExpressionVisitor.getNotFromResolverVisitor(filter);
for (Expression e : valueList) {
if (!e.isEverything(visitor))
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!e.isEverything(visitor)) {
    return;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ExpressionVisitor visitor = ExpressionVisitor.getNotFromResolverVisitor(filter);
for (Expression e : valueList) {
if (!e.isEverything(visitor))
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

builder.append(" IN(");
for (int i = 0, s = expressionList.size(); i < s; i++) {
if (i > 0)
builder.append(", ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Column[] columns = getColumns();
for (int i = columns.length; --i >= 0; ) {
if (TableType.TABLE != columns[i].getTable().getTableType())
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (!isCompoundColumns()) {
builder.append("column=").append(column);
} else {
builder.append("columns=").append(columns);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column[].toString() doesn't return anything useful here, use Column.writeColumns(builder, columns, TRACE_SQL_FLAGS) instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

for (int i = 0; i < cols.length; i++) {
IndexColumn idxCol = cols[i];
if (idxCol != null && idxCol.column != columns[i])
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

copy[i] = nv;
}
}
return copy == null ? valueRow : ValueRow.get(valueRow.getType(), copy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess valueRow.getType() can return wrong data type information here, so TypeInfo.getTypeInfo(Value.ROW, 0, 0, new ExtTypeInfoRow(column)) should be used instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Column col = columns[j];
indexed = col.getColumnId() >= 0 && index.getColumnIndex(col) >= 0;
if (!indexed)
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (col.getColumnId() >= 0) {
int columnIndex = index.getColumnIndex(col);
if (columnIndex == 0) // The first column of the index always matches.
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -4424,6 +4424,9 @@ SELECT * FROM TEST2COL WHERE A=0 AND B=0;
EXPLAIN SELECT * FROM TEST2COL WHERE A=0 AND B=0;
>> SELECT "PUBLIC"."TEST2COL"."A", "PUBLIC"."TEST2COL"."B", "PUBLIC"."TEST2COL"."C" FROM "PUBLIC"."TEST2COL" /* PUBLIC.PRIMARY_KEY_E: A = 0 AND B = 0 */ WHERE ("A" = 0) AND ("B" = 0)

EXPLAIN SELECT * FROM TEST2COL WHERE (A, B)=(0, 0);
>> SELECT "PUBLIC"."TEST2COL"."A", "PUBLIC"."TEST2COL"."B", "PUBLIC"."TEST2COL"."C" FROM "PUBLIC"."TEST2COL" /* PUBLIC.PRIMARY_KEY_E: A = 0 AND B = 0 */ WHERE ROW ("A", "B") = ROW (0, 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't add any new tests to this legacy script. You can add them to indexes.sql instead.

I think we need more tests including multi-column joins with other tables with both EXPLAIN and actual test of query results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I deleted the new lines.

I created a new test class: TestCompoundIndexSearch

@kiss034
Copy link
Contributor Author

kiss034 commented Nov 16, 2023

Hi @katzyn,

I fixed your previous findings. Could you please check the pull request again? Thanks.

@kiss034 kiss034 requested a review from katzyn November 16, 2023 16:50
@katzyn
Copy link
Contributor

katzyn commented Nov 20, 2023

Something is wrong with execution plan in the following example:

CREATE TABLE TEST(A INTEGER, B INTEGER, C INTEGER) AS
  SELECT A, B, C FROM (VALUES 1, 2) T1(A) JOIN (VALUES 3, 4) T2(B) JOIN (VALUES 5, 6) T3(C);

CREATE INDEX TEST_C_A_IDX ON TEST(C, A);

EXPLAIN ANALYZE SELECT * FROM TEST WHERE (A, C) IN ((1, 5), (2, 6));
SELECT
    "PUBLIC"."TEST"."A",
    "PUBLIC"."TEST"."B",
    "PUBLIC"."TEST"."C"
FROM "PUBLIC"."TEST"
    /* PUBLIC.TEST_C_A_IDX:  IN(ROW (1, 5), ROW (2, 6))
        AND C IN(5, 6)
     */
    /* scanCount: 9 */
WHERE ROW ("A", "C") IN(ROW (1, 5), ROW (2, 6))

Correct plan should look like that:

    /* PUBLIC.TEST_C_A_IDX:  IN(ROW (5, 1), ROW (6, 2))
     */

Columns need to be specified in correct order and AND C IN(5, 6) should not be listed, because only one index condition can be used by H2.

scanCount: 9 also looks very suspicious, query with WHERE (C, A) IN ((5, 1), (6, 2)) performs only 5 reads. 9 reads are performed by this query if index doesn't exist.

@kiss034
Copy link
Contributor Author

kiss034 commented Nov 23, 2023

Hi @katzyn,

I fixed the index condition error although, I am not sure this is what you wanted to see. If the query uses the indexed columns in a wrong order, I re-create the index condition with a correct column order. It looks a bit ugly to me, but it seems to work.

Copy link
Contributor

@katzyn katzyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, send a license statement as described here
https://h2database.com/html/build.html#providing_patches
to our mailing list (Google group)
https://groups.google.com/g/h2-database
(This group is partially pre-moderated, your post may not appear immediately.)

@katzyn katzyn merged commit 539e8d6 into h2database:master Nov 27, 2023
2 checks passed
catull pushed a commit to catull/h2database that referenced this pull request Jan 6, 2024
* Improving search by compound indexes.

* Fixing review findings.

* Fixing the index condition handling.

* Adjusting test cases.

* Fixing TableFilter.prepare().

* Adjusting test cases.

* Refactoring the constructors of the IndexCondition class.

* Removing unnecessary TODOs.

* Fixing review findings.

* Introducing the TestCompoundIndexSearch class.

* Preparing IndexCondition to deal with queries where the columns not used in the right order.

* Fixing test errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants