Add CF `mark` for `Lock` and `Rollback` records #102

sticnarf · 2022-09-28T08:38:58Z

No description provided.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

sticnarf · 2022-09-28T08:42:20Z

I think @ekexium has an extensive analysis about different solutions to the problem. This is what I think the most practical one weighing the benefit and the cost.

Are you happy to share the desensitized document and the comparison sheet? @ekexium

ekexium · 2022-09-29T06:21:02Z

Are you happy to share the desensitized document and the comparison sheet? @ekexium

Here you go! ~~https://docs.google.com/document/d/1cg_pAVPnAvOz9CwMihDkregvyN-jl20nSSr75NuU0Is/edit#~~ tikv/sig-transaction#111

cfzjywxk · 2022-10-13T13:17:31Z

text/0000-mark-cf.md

+
+In `CheckSecondaryKeys`, we need to check both CFs of all the given secondary keys to know whether some keys are already committed or rolled back.
+
+And when prewrite raises an error or we are prewriting non-pessimistic keys in a retry, we also need the precise status of the key to guarantee idempotence. This also requires to read both the write and mark CFs.


The prewrite request with deferred constraint check may also need the precise status of the key, maybe it could be considered as a happy path for committing this kind of transaction.

I don't get it... Could you detail the reason a bit more?

It seems to me that prewrite with deferred constraint check needs (1) write conflict check and (2) the latest effective record (PUT/DELETE). Write conflict check only needs the latest record in the write CF. And we don't cost more for reading the latest effective record.

If there is a mark record for this key, prewrite must fail because of a newer record in the write CF.

So actually the deferred prewrite request processing does not need to read the mark CF every time? Seems I misunderstood I thought it needed to check mark CF each time and there would be more overhead than before for this kind of transaction.

cfzjywxk · 2022-10-13T13:25:26Z

text/0000-mark-cf.md

+
+If a new `Lock` record is written to the write CF while the latest version is also a `Lock` or `Rollback`, instead of removing the previous version, just overwrite that record and add a `real_commit_ts` to the record. When checking write conflicts, we should parse the value and check the real commit TS because the timestamp encoded into the key may be not accurate.
+
+It may help reduce tombstones but breaks too many assumptions before.


If there are no typical user scenarios in which this kind of tombstone issue has a significant impact, the priority for this optimization could be lowered as it's a bit complex to prove the correctness.

text/0000-mark-cf.md

cfzjywxk · 2022-10-13T13:41:56Z

text/0000-mark-cf.md

+
+And when prewrite raises an error or we are prewriting non-pessimistic keys in a retry, we also need the precise status of the key to guarantee idempotence. This also requires to read both the write and mark CFs.
+
+Luckily, all of these don't happen frequently in production. The extra cost is not a big issue.


As the key format is {user_key}{start_ts} and it's different from keys is write cf, maybe we could describe a bit more details about the conflict check process here.

Conflict checking itself only needs to read the write CF for the maximum version. When we need to read the mark CF, we are confirming whether the transaction we are prewriting has been rolled back. So, it's always a point get according to the start_ts on the mark CF.

text/0000-mark-cf.md

cfzjywxk · 2022-10-13T13:49:43Z

text/0000-mark-cf.md

+
+- Support generating and ingesting snapshot for the mark CF
+- Consider the keys and size of the mark CF in the split checker
+- After removing the WAL of KV DB, the memtable needs to be flushed if it blocks the GC of the Raft logs.


After the separation for kv db instances, there may be some compatible work.

cfzjywxk · 2022-10-14T03:53:59Z

text/0000-mark-cf.md

+When BR takes a snapshot of TiKV, all the locks before the snapshot should be resolved. In this case, the records in the mark CF really don't matter.
+
+BR can just ignore the mark CF.
+


Another benefit that could be mentioned here is pessimistic locking behavior issues like pingcap/tidb#36438 could be resolved completely and previous tricky solutions could be optimized.

you06 · 2022-10-18T10:11:10Z

text/0000-mark-cf.md

+
+#### Format
+
+Key: `{user_key}{start_ts}`


We always read the mark CF by key with a specific timestamp, am I right?

you06 · 2022-10-18T10:33:27Z

text/0000-mark-cf.md

+
+The records in the mark CF don't need to exist after all keys in the transaction are totally resolved. The client resolves all the locks before a certain timestamp before updating this timestamp as the safe point.
+
+So, when TiKV is ready to do GC, all records in the mark CF whose `commit_ts` is less than the safe point can be deleted. It can be done in the compaction filter.


What about Rollback records, they do not have commit_ts.

Conventionally, the commit_ts of Rollback is exactly its start_ts.

TonsnakeLin · 2022-10-19T05:19:55Z

text/0000-mark-cf.md

+
+We already have a collapsing mechanism to merge consecutive `Rollback` records. And it was invented back to the days when pessimistic transactions are not supported. Now, it's unlikely to have many `Rollback` records that affect read performance.
+
+The `Lock` record is a bit more complicated. At the very beginning, when there is no pessimistic transaction or async-commit transaction, it is only used to check read-write conflicts, mostly for write-skew prevention. In pessimistic transactions, if a key is locked but not changed in the end, the pessimistic lock will be finally turned into a `Lock` record. In these cases, `Lock` records exist to cause write conflicts. If it happens to be the primary key of the transaction, it also marks the committed status. So, if the `Lock` record is only to cause write conflicts, it doesn't need to exist after any newer record is written. However, it is not true for the primary keys.


The Lock record for optimistic transaction is also used to cause write conflicts, right ?
And, how to understand write-skew prevention?

I executed the optimistic transaction like "begin optimistic;select * from t1 where name = "xxxxx" for update ;commit;". It also generated a Lock record for write CF. So, the LOCK record also as primary key of optimistic transaction ans marks the committed status, right?

The Lock record for optimistic transaction is also used to cause write conflicts, right ?
And, how to understand write-skew prevention?

Yes. write-skew prevention is right the reason why people use SELECT FOR UPDATE in optimistic transactions. For example, select where id = 1 and use the result to update the row id = 2. There can be a write-skew if we don't check conflict on id = 1.

I executed the optimistic transaction like "begin optimistic;select * from t1 where name = "xxxxx" for update ;commit;". It also generated a Lock record for write CF. So, the LOCK record also as primary key of optimistic transaction ans marks the committed status, right?

Right. And that's why I say "However, it is not true for the primary keys.". In this case, the Lock cannot be collapsed.

TonsnakeLin · 2022-10-19T11:44:17Z

It brings a little extra jobs when committing the transaction such as writing two column families, deleting old Lock record. We should do the performance test for small oltp transactions which need to be committed as quickly as possible.
The reset LGTM.

sticnarf · 2022-10-25T11:10:03Z

To completely solve the issue, it is a must to overwrite the latest Lock or Rollback. Otherwise, the tombstones in the memtable can greatly slow down reading.

However, reusing the key of the latest Lock is too tricky. It is a huge question mark whether it is worth doing this way.

cfzjywxk · 2022-10-25T11:59:38Z

To completely solve the issue, it is a must to overwrite the latest Lock or Rollback. Otherwise, the tombstones in the memtable can greatly slow down reading.

However, reusing the key of the latest Lock is too tricky. It is a huge question mark whether it is worth doing this way.

It's quite risky to overwrite the existing key, maybe other workarounds are needed to avoid the worst case with too much tombstone next in the memory table.

Add CF mark for Lock and Rollback records

5a1fb53

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

sticnarf requested review from cfzjywxk, ekexium, MyonKeminta and you06 September 28, 2022 08:43

sticnarf mentioned this pull request Oct 13, 2022

[DNM] Mark CF Developemnt tikv/tikv#13595

Closed

cfzjywxk reviewed Oct 13, 2022

View reviewed changes

cfzjywxk reviewed Oct 14, 2022

View reviewed changes

you06 reviewed Oct 18, 2022

View reviewed changes

TonsnakeLin reviewed Oct 19, 2022

View reviewed changes

TonsnakeLin approved these changes Oct 25, 2022

View reviewed changes

youjiali1995 approved these changes Oct 26, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CF `mark` for `Lock` and `Rollback` records #102

Add CF `mark` for `Lock` and `Rollback` records #102

sticnarf commented Sep 28, 2022

sticnarf commented Sep 28, 2022

ekexium commented Sep 29, 2022 •

edited

cfzjywxk Oct 13, 2022

sticnarf Oct 20, 2022

cfzjywxk Oct 25, 2022

cfzjywxk Oct 13, 2022

cfzjywxk Oct 13, 2022

sticnarf Oct 13, 2022

cfzjywxk Oct 13, 2022

cfzjywxk Oct 14, 2022

you06 Oct 18, 2022

sticnarf Oct 18, 2022

you06 Oct 18, 2022

sticnarf Oct 18, 2022

TonsnakeLin Oct 19, 2022 •

edited

sticnarf Oct 19, 2022

TonsnakeLin commented Oct 19, 2022

sticnarf commented Oct 25, 2022

cfzjywxk commented Oct 25, 2022


		In `CheckSecondaryKeys`, we need to check both CFs of all the given secondary keys to know whether some keys are already committed or rolled back.

		And when prewrite raises an error or we are prewriting non-pessimistic keys in a retry, we also need the precise status of the key to guarantee idempotence. This also requires to read both the write and mark CFs.


		If a new `Lock` record is written to the write CF while the latest version is also a `Lock` or `Rollback`, instead of removing the previous version, just overwrite that record and add a `real_commit_ts` to the record. When checking write conflicts, we should parse the value and check the real commit TS because the timestamp encoded into the key may be not accurate.

		It may help reduce tombstones but breaks too many assumptions before.


		And when prewrite raises an error or we are prewriting non-pessimistic keys in a retry, we also need the precise status of the key to guarantee idempotence. This also requires to read both the write and mark CFs.

		Luckily, all of these don't happen frequently in production. The extra cost is not a big issue.

		When BR takes a snapshot of TiKV, all the locks before the snapshot should be resolved. In this case, the records in the mark CF really don't matter.

		BR can just ignore the mark CF.


		The records in the mark CF don't need to exist after all keys in the transaction are totally resolved. The client resolves all the locks before a certain timestamp before updating this timestamp as the safe point.

		So, when TiKV is ready to do GC, all records in the mark CF whose `commit_ts` is less than the safe point can be deleted. It can be done in the compaction filter.


		We already have a collapsing mechanism to merge consecutive `Rollback` records. And it was invented back to the days when pessimistic transactions are not supported. Now, it's unlikely to have many `Rollback` records that affect read performance.

		The `Lock` record is a bit more complicated. At the very beginning, when there is no pessimistic transaction or async-commit transaction, it is only used to check read-write conflicts, mostly for write-skew prevention. In pessimistic transactions, if a key is locked but not changed in the end, the pessimistic lock will be finally turned into a `Lock` record. In these cases, `Lock` records exist to cause write conflicts. If it happens to be the primary key of the transaction, it also marks the committed status. So, if the `Lock` record is only to cause write conflicts, it doesn't need to exist after any newer record is written. However, it is not true for the primary keys.


		#### Format

		Key: `{user_key}{start_ts}`

Add CF mark for Lock and Rollback records #102

Are you sure you want to change the base?

Add CF mark for Lock and Rollback records #102

Conversation

sticnarf commented Sep 28, 2022

sticnarf commented Sep 28, 2022

ekexium commented Sep 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TonsnakeLin Oct 19, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TonsnakeLin commented Oct 19, 2022

sticnarf commented Oct 25, 2022

cfzjywxk commented Oct 25, 2022

Add CF `mark` for `Lock` and `Rollback` records #102

Add CF `mark` for `Lock` and `Rollback` records #102

ekexium commented Sep 29, 2022 •

edited

TonsnakeLin Oct 19, 2022 •

edited