Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all condition in SQL got pushdown in TiKV Table range scan #16798

Open
wxiaomou opened this issue Apr 10, 2024 · 7 comments
Open

Not all condition in SQL got pushdown in TiKV Table range scan #16798

wxiaomou opened this issue Apr 10, 2024 · 7 comments
Labels
type/feature-request Type: Issue - Feature request

Comments

@wxiaomou
Copy link

Feature Request

Is your feature request related to a problem? Please describe:

delete from table_name
where ( primary_key > ? or ( primary_key = ? and secondary_key > ? ) or ( primary_key = ? and secondary_key = ? and timestamp >= ? ) )
and ( primary_key < ? or ( primary_key = ? and secondary_key < ? ) or ( primary_key = ? and secondary_key = ? and timestamp <= ? ) )
and timestamp <= ? - ? limit ?

In the query slow log we can see only the primary key range of the where condition is pushed down

range:[pk_range_start,pk_range_start], (pk_range_start,pk_range_end), [pk_range_end,pk_range_end],

This results a 1050324 row scan on tikv instead of 4000

0 tikv_task:{proc max:2.67s, min:0s, avg: 1.46s, p80:2.52s, p95:2.6s, iters:1141, tasks:25}, scan_detail: {total_process_keys: 1050324, total_process_keys_size: 12500490702, total_keys: 1050351, get_snapshot_time: 9.49ms, rocksdb: {key_skipped_count: 2100624, block: {cache_hit_count: 103039, read_count: 102540, read_byte: 1.39 GB, read_time: 12s}}} N/A N/A
└─Selection_12 cop[tikv] 4000 le(kv.table_name.timestamp, 1704622744942000000), or(gt(kv.table_name.primary_key, "pk_range_start"), or(and(eq(kv.table_name.primary_key, "pk_range_start"), gt(kv.table_name.secondary_key, "sk_range_start")), and(eq(kv.table_name.primary_key, "pk_range_start"), and(eq(kv.table_name.secondary_key, "sk_range_start"), ge(kv.table_name.timestamp, 1707885658544000000))))), or(lt(kv.table_name.primary_key, "pk_range_end"), or(and(eq(kv.table_name.primary_key, "pk_range_end"), lt(kv.table_name.secondary_key, "sk_range_end")), and(eq(kv.table_name.primary_key, "pk_range_end"), and(eq(kv.table_name.secondary_key, "sk_range_end"), le(kv.table_name.timestamp, 1707885658544000000))))) 0 tikv_task:{proc max:2.67s, min:0s, avg: 1.46s, p80:2.52s, p95:2.6s, iters:1141, tasks:25} N/A N/A
└─TableRangeScan_11 cop[tikv] 5000 table:table_name, range:[pk_range_start,pk_range_start], (pk_range_start,pk_range_end), [pk_range_end,pk_range_end], keep order:false, stats:pseudo 1050324 tikv_task:{proc max:2.66s, min:0s, avg: 1.45s, p80:2.52s, p95:2.59s, iters:1141, tasks:25} N/A N/A
Plan_digest: bcdc4a42f09720ccdcb027978269914644191d4f802439990219556183bced9d
Binary_plan:...
Prev_stmt:
Query: delete from table_name where ( primary_key > ? or ( primary_key = ? and secondary_key > ? ) or ( primary_key = ? and secondary_key = ? and timestamp >= ? ) ) and ( primary_key < ? or ( primary_key = ? and secondary_key < ? ) or ( primary_key = ? and secondary_key = ? and timestamp <= ? ) ) and timestamp <= ? - ? limit ? ;
1 row in set (2.52 sec)

Describe the feature you'd like:

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@wxiaomou wxiaomou added the type/feature-request Type: Issue - Feature request label Apr 10, 2024
@you06
Copy link
Contributor

you06 commented May 8, 2024

If I understand correctly, what you want is (pk1, sk1, ts1), (pk2, sk2, ts2) & ts < ts_max or (pk1, sk1, ts1), (pk2, sk2, min(ts2, ts_max)), TiDB is running a [(pk1), (pk2)] scan instead.

@AilinKid
Copy link
Contributor

If I understand correctly, what you want is (pk1, sk1, ts1), (pk2, sk2, ts2) & ts < ts_max or (pk1, sk1, ts1), (pk2, sk2, min(ts2, ts_max)), TiDB is running a [(pk1), (pk2)] scan instead.

a fine-grained range decision? using what you can grab as much as possible?

@Tema
Copy link

Tema commented May 16, 2024

Same as pingcap/tidb#41598

@cgtz
Copy link

cgtz commented May 22, 2024

Here is an example of a select query plan that has to scan over 2 entire pks. I have obfuscated the real PK and SK fields. This a select statement similar to the delete statement seen in the description

mysql> explain analyze select count(*) from kv.`tablename` where ( `primary_key` > primary_key_start or ( `primary_key` = primary_key_start and `secondary_key` > secondary_key_start ) or ( `primary_key` = primary_key_start and `secondary_key` = secondary_key_start and timestamp >= 1707885658544000000 ) ) and ( `primary_key` < primary_key_end or ( `primary_key` = primary_key_end and `secondary_key` < secondary_key_end ) or ( `primary_key` = primary_key_end and `secondary_key` = secondary_key_end and timestamp <= 1707885658544000000 ) );
+-------------------------------+-------------+---------+-----------+------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id                            | estRows     | actRows | task      | access object                                        | execution info                                                                                                                                                                                                                                                                                                                                               | operator info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | memory    | disk |
+-------------------------------+-------------+---------+-----------+------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| HashAgg_13                    | 1.00        | 1       | root      |                                                      | time:78.5ms, loops:2, partial_worker:{wall_time:78.48922ms, concurrency:5, task_num:1, tot_wait:391.932848ms, tot_exec:2.679µs, tot_time:391.940732ms, max:78.39244ms, p95:78.39244ms}, final_worker:{wall_time:78.496872ms, concurrency:5, task_num:1, tot_wait:391.993857ms, tot_exec:17.327µs, tot_time:392.013428ms, max:78.406906ms, p95:78.406906ms}   | funcs:count(Column#6)->Column#5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 9.86 KB   | N/A  |
| └─TableReader_14              | 1.00        | 2       | root      |                                                      | time:78.4ms, loops:2, cop_task: {num: 25, max: 51.5ms, min: 1.92ms, avg: 31.5ms, p95: 51.1ms, max_proc_keys: 45612, p95_proc_keys: 45612, tot_proc: 732ms, rpc_num: 25, rpc_time: 787.3ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                                                              | data:HashAgg_6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 397 Bytes | N/A  |
|   └─HashAgg_6                 | 1.00        | 2       | cop[tikv] |                                                      | tikv_task:{proc max:50ms, min:0s, avg: 29.6ms, p80:40ms, p95:50ms, iters:1039, tasks:25}, scan_detail: {total_process_keys: 1050324, total_process_keys_size: 75623328, total_keys: 1050351, get_snapshot_time: 18.6ms, rocksdb: {key_skipped_count: 1050324, block: {cache_hit_count: 776, read_count: 508, read_byte: 1.84 MB, read_time: 2.36ms}}}        | funcs:count(1)->Column#6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | N/A       | N/A  |
|     └─Selection_12            | 58448445.84 | 4001    | cop[tikv] |                                                      | tikv_task:{proc max:50ms, min:0s, avg: 29.6ms, p80:40ms, p95:50ms, iters:1039, tasks:25}                                                                                                                                                                                                                                                                     | or(gt(kv.tablename.primary_key, "primary_key_start"), or(and(eq(kv.tablename.primary_key, "primary_key_start"), gt(kv.tablename.secondary_key, "secondary_key_start")), and(eq(kv.tablename.primary_key, "primary_key_start"), and(eq(kv.tablename.secondary_key, "secondary_key_start"), ge(kv.tablename.timestamp, 1707885658544000000))))), or(lt(kv.tablename.primary_key, "primary_key_end"), or(and(eq(kv.tablename.primary_key, "primary_key_end"), lt(kv.tablename.secondary_key, "secondary_key_end")), and(eq(kv.tablename.primary_key, "primary_key_end"), and(eq(kv.tablename.secondary_key, "secondary_key_end"), le(kv.tablename.timestamp, 1707885658544000000))))) | N/A       | N/A  |
|       └─TableRangeScan_11     | 73060557.30 | 1050324 | cop[tikv] | table:tablename | tikv_task:{proc max:40ms, min:0s, avg: 18.4ms, p80:30ms, p95:30ms, iters:1039, tasks:25}                                                                                                                                                                                                                                                                     | range:[primary_key_start,primary_key_start], (primary_key_start,primary_key_end), [primary_key_end,primary_key_end], keep order:false, stats:pseudo                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | N/A       | N/A  |
+-------------------------------+-------------+---------+-----------+------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
5 rows in set (0.11 sec)

This range only has 4000 keys in it but the explain analyze above shows that it processed 1050351 total keys (roughly number of keys under primary_key_start U primary_key_end). The range of the innermost TableRangeScan is range: [primary_key_start,primary_key_start], (primary_key_start,primary_key_end), [primary_key_end,primary_key_end]

mysql> select count(*) from kv.`tablename` where ( `primary_key` > primary_key_start or ( `primary_key` = primary_key_start and `secondary_key` > secondary_key_start ) or ( `primary_key` = primary_key_start and `secondary_key` = secondary_key_start and timestamp >= 1707885658544000000 ) ) and ( `primary_key` < primary_key_end or ( `primary_key` = primary_key_end and `secondary_key` < secondary_key_end ) or ( `primary_key` = primary_key_end and `secondary_key` = secondary_key_end and timestamp <= 1707885658544000000 ) );
+----------+
| count(*) |
+----------+
|     4001 |
+----------+
1 row in set (0.00 sec)

Total number of keys under primary_key_start:

mysql> select count(*) from kv.`tablename` where `primary_key` = primary_key_start;
+----------+
| count(*) |
+----------+
|   525162 |
+----------+
1 row in set (0.03 sec)

Total number of keys under primary_key_end:

mysql> select count(*) from kv.`957070a40a24aa6e2c5f6cb77f88226b_1706196567537` where `primary_key` = primary_key_end;
+----------+
| count(*) |
+----------+
|   525162 |
+----------+
1 row in set (0.03 sec)

(in this table each primary_key has similar cardinality of secondary_keys underneath)


Generally, we want to be able to efficiently handle queries that scan between (start_pk, start_sk, start_timestamp) and (end_pk, end_sk, end_timestamp) with some extra filters like timestamp < constant (i.e. for the delete statement on this issue we want to delete all values in the range with timestamps older than some threshold)

Another option would to be able to use the cleaner tuple syntax for this type of query. This syntax would be more concise but would perform even worse and result in a full table scan:

mysql> explain select count(*) from kv.`tablename` where ( `primary_key`, `secondary_key`, `timestamp`) >= (primary_key_start, secondary_key_start, 1707885658544000000 ) and ( `primary_key`, `secondary_key`, `timestamp`) <= (primary_key_end, secondary_key_end, 1707885658544000000) and timestamp < 320;
+------------------------------+--------------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows      | task      | access object                                        | operator info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+------------------------------+--------------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashAgg_13                   | 1.00         | root      |                                                      | funcs:count(Column#6)->Column#5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| └─TableReader_14             | 1.00         | root      |                                                      | data:HashAgg_6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|   └─HashAgg_6                | 1.00         | cop[tikv] |                                                      | funcs:count(1)->Column#6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|     └─Selection_12           | 138404798.35 | cop[tikv] |                                                      | if(ne(kv.tablename.primary_key, "primary_key_start"), ge(kv.tablename.primary_key, "primary_key_start"), if(isnull(ne(kv.tablename.primary_key, "primary_key_start")), NULL, if(ne(kv.tablename.secondary_key, "secondary_key_start"), ge(kv.tablename.secondary_key, "secondary_key_start"), if(isnull(ne(kv.tablename.secondary_key, "secondary_key_start")), NULL, ge(kv.tablename.timestamp, 1707885658544000000))))), if(ne(kv.tablename.primary_key, "primary_key_end"), le(kv.tablename.primary_key, "primary_key_end"), if(isnull(ne(kv.tablename.primary_key, "primary_key_end")), NULL, if(ne(kv.tablename.secondary_key, "secondary_key_end"), le(kv.tablename.secondary_key, "secondary_key_end"), if(isnull(ne(kv.tablename.secondary_key, "secondary_key_end")), NULL, le(kv.tablename.timestamp, 1707885658544000000))))), lt(kv.tablename.timestamp, 320) |
|       └─TableFullScan_11     | 520579733.00 | cop[tikv] | table:tablename | keep order:false, stats:pseudo                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+------------------------------+--------------+-----------+------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

@ghazalfamilyusa
Copy link

ghazalfamilyusa commented May 23, 2024

@cgtz Could you please add DDL?
I am not sure how the range scan of both PK and secondary indexes can be pushed to the table scan?

@cgtz
Copy link

cgtz commented May 23, 2024

Yes, this condition pushdown issue is with the select part of the delete and the same issue occurs with normal selects.
The columns primary_key, secondary_key and timestamp are actually all just parts of a composite primary key. secondary_key is not actually part of any secondary index.

CREATE TABLE `tablename` (
  `primary_key` varbinary(1024) NOT NULL,
  `secondary_key` varbinary(1024) NOT NULL,
  `timestamp` bigint(20) NOT NULL,
  `value` mediumblob DEFAULT NULL,
  PRIMARY KEY (`primary_key`,`secondary_key`,`timestamp`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

@ghazalfamilyusa
Copy link

Did some analysis and seems the range intersection is limited to the leading column. Below, is a simple example.
t1 with index on (a1,b1).

Q1 is ((a1>1) or (a1=1 and b1 > 10)) and we get the correct range:(1 10,1 +inf], (1,+inf]
Q2 is ((a1<10) or (a1=10 and b1 < 20)) and we also get the correct range:[-inf,10), [10 -inf,10 20)
Q3 is Q1 and Q2 combined. ((a1>1) or (a1=1 and b1 > 10)) and ((a1<10) or (a1=10 and b1 < 20)) and we get correct intersection but not optimal range:[1,1], (1,10), [10,10]
Correct result for Q3 is Q4 with range range:(1 10,1 +inf], (1,10), [10 -inf,10 20)

drop table if exists t1
create table t1 (a1 int, b1 int, c1 int, primary key pkx (a1,b1))
explain select /*+ USE_INDEX(t1,PKX) */ count(*) from t1 where  
   ((a1>1) or (a1=1 and b1 > 10))
--------------
Q1:
id      estRows task    access object   operator info
HashAgg_12      1.00    root            funcs:count(Column#5)->Column#4
└─TableReader_13        1.00    root            data:HashAgg_6
  └─HashAgg_6   1.00    cop[tikv]               funcs:count(1)->Column#5
    └─TableRangeScan_11 3366.67 cop[tikv]       table:t1        range:(1 10,1 +inf], (1,+inf], keep order:false, stats:pseudo
--------------
explain select /*+ USE_INDEX(t1,PKX) */ count(*) from t1 where
   ((a1<10) or (a1=10 and b1 < 20))
--------------
Q2
id      estRows task    access object   operator info
HashAgg_12      1.00    root            funcs:count(Column#5)->Column#4
└─TableReader_13        1.00    root            data:HashAgg_6
  └─HashAgg_6   1.00    cop[tikv]               funcs:count(1)->Column#5
    └─TableRangeScan_11 3356.57 cop[tikv]       table:t1        range:[-inf,10), [10 -inf,10 20), keep order:false, stats:pseudo
--------------
explain select /*+ USE_INDEX(t1,PKX) */ count(*) from t1 where
   ((a1>1) or (a1=1 and b1 > 10)) and ((a1<10) or (a1=10 and b1 < 20))
--------------
Q3
id      estRows task    access object   operator info
HashAgg_13      1.00    root            funcs:count(Column#5)->Column#4
└─TableReader_14        1.00    root            data:HashAgg_6
  └─HashAgg_6   1.00    cop[tikv]               funcs:count(1)->Column#5
    └─Selection_12      1122.61 cop[tikv]               or(gt(test.t1.a1, 1), and(eq(test.t1.a1, 1), gt(test.t1.b1, 10))), or(lt(test.t1.a1, 10), and(eq(test.t1.a1, 10), lt(test.t1.b1, 20)))
      └─TableRangeScan_11       1403.26 cop[tikv]       table:t1        range:[1,1], (1,10), [10,10], keep order:false, stats:pseudo
--------------
explain select /*+ USE_INDEX(t1,PKX) */ count(*) from t1 where
   (a1>1 and a1 < 10) or (a1=1 and b1>10) or (a1=10 and b1<20)
--------------
Q4
id      estRows task    access object   operator info
StreamAgg_17    1.00    root            funcs:count(Column#6)->Column#4
└─TableReader_18        1.00    root            data:StreamAgg_9
  └─StreamAgg_9 1.00    cop[tikv]               funcs:count(1)->Column#6
    └─TableRangeScan_16 316.57  cop[tikv]       table:t1        range:(1 10,1 +inf], (1,10), [10 -inf,10 20), keep order:false, stats:pseudo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature-request Type: Issue - Feature request
Projects
None yet
Development

No branches or pull requests

6 participants