introduce latest cf #95

BusyJay · 2022-06-05T07:15:17Z

Add a cf (column family in RocksDB) named "latest" to store the latest version of keys in MVCC.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

zhangjinpeng87 · 2022-06-05T08:07:03Z

text/0095-add-latest-cf.md

+- key set the original key without any encoding or version
+- value set the same value in write cf but include the corresponding version.
+
+For example, insert k1 with version v0 and value dummy will insert two keys


This will double the write amplification, not only by size but also by key value pairs.

Yes, it's addressed in the drawback section.

zhangjinpeng87 · 2022-06-05T08:15:08Z

text/0095-add-latest-cf.md

+
+Because all existing cfs are updated just as before, so there are no major compatibility issues.
+
+But using latest cf should be triggered explicitly by client. Client should ensure only when it updates all keys with latest cf will it ask TiKV to query using latest cf.


It is a bit of complicated.

Usually it won't as the client already needs to manage different ranges, just like table in TiDB or key space.

zhangjinpeng87 · 2022-06-05T08:17:41Z

text/0095-add-latest-cf.md

+
+## Alternatives
+
+unistore separates the latest version and other versions by adjust file format. So when flushing or compacting, it will make latest versions key be the first part, and the rest in the second part. This approach doesn't have write overhead, but is not backward compatible in TiKV.


How about create a history cf to store as many history version as it can, the write cf store the latest several versions, and background thread like GC thread move existed history versions to history cf.
For point get, using RocksDB's user timestamp feature to get the latest version can see, this also can prevent creating iterator.

There is no compatibility issue for this solution.

Compaction filter is not reliable, we have seen issue caused by not having compaction in time like tikv/tikv#12729. This proposal doesn't depend on the user timestamp, which is not used in production yet. And can be landed table by table instead of the whole cluster, have less impact on existing code. And this proposal should have better scan performance than relying on compaction filter.

Added as an alternative with more argument.

Using history cf doesn't need the compaction filter, gc just check the history cf sst file's max ts to determine wether to drop the whole sst file.

Compaction filter is not used for dropping history sst, but for moving data from write cf to history cf.

Compaction filter is not necessary, I think version statistics-based GC is much better.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

zhangjinpeng87 · 2022-06-11T00:56:03Z

text/0095-add-latest-cf.md

+
+As explained, the reason why seek is necessary is because TiKV has no idea what versions are available. Otherwise it can just use `get` to query the specific key, which is a lot cheaper.
+
+## Detailed design


How about lightning, cdc, br, tiflash's compatibility?

They are not part of TiKV. I would like to talk about them in another doc.

zhangjinpeng87 · 2022-06-11T13:57:43Z

text/0095-add-latest-cf.md

+
+TiKV doesn't need to know all existing versions of keys. In fact, most of the time, v0 is larger than any existing versions of keys in TiKV if there is more read than write. So it should be enough to just let TiKV knows the latest version of all keys.
+
+The RFC propose to add a new cf named "latest". When a key is inserted using transaction API, it should update write cf (and default cf) as current does. In addition, it also insert a key to latest cf with


this double write may also double the total disk size in some scenarios.

Maybe for those new table(new range) in the existing cluster and the new created cluster we can use the new mechanism(only write latest cf, move the old version to write cf only when the older version existed) is a better choice?

It would work, though it loses the ability to upgrade existing table to new format online.

Or we can introduce 3 formats. One is the original one, second is double write, third is move on update. They can be upgraded online one by one. For new tables, third format is used by default.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

BusyJay · 2022-06-12T17:55:24Z

Although I updated the RFC to write latest CF first, and then move keys between CFs, I suggest to always double write in POC as it has less compatibility issues.

ekexium · 2022-06-15T08:43:12Z

text/0095-add-latest-cf.md

+For example, supposing there is no key in latest cf. Inserting k1 with version v0 and value foo will insert one key:
+- to latest cf, k1 -> (foo, v0 and other meta)
+
+Inserting v1 again with version v1 and value bar will insert two keys:


Suggested change

Inserting v1 again with version v1 and value bar will insert two keys:

Inserting k1 again with version v1 and value bar will insert two keys:

Is it k1?

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

BusyJay added 2 commits June 5, 2022 00:14

introduce latest cf

b459c10

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

update number

d91583d

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

zhangjinpeng87 reviewed Jun 5, 2022

View reviewed changes

add another alternative and compability issue

75ffd9e

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

BusyJay force-pushed the add-latest-cf branch from 1e6df44 to 75ffd9e Compare June 5, 2022 23:28

zhangjinpeng87 reviewed Jun 11, 2022

View reviewed changes

address comment

e267b07

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

ekexium reviewed Jun 15, 2022

View reviewed changes

fix typo

d64a4a6

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

ti-chi-bot added needs-cherry-pick-release-6.5 and removed needs-cherry-pick-release-6.5 labels Mar 25, 2024

ti-chi-bot added the needs-cherry-pick-release-7.5 label May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce latest cf #95

introduce latest cf #95

BusyJay commented Jun 5, 2022 •

edited by AndreMouche

zhangjinpeng87 Jun 5, 2022

BusyJay Jun 5, 2022

zhangjinpeng87 Jun 5, 2022

BusyJay Jun 5, 2022

zhangjinpeng87 Jun 5, 2022

zhangjinpeng87 Jun 5, 2022

BusyJay Jun 5, 2022 •

edited

BusyJay Jun 5, 2022 •

edited

zhangjinpeng87 Jun 8, 2022

BusyJay Jun 8, 2022

zhangjinpeng87 Jun 9, 2022

zhangjinpeng87 Jun 11, 2022

BusyJay Jun 11, 2022

zhangjinpeng87 Jun 11, 2022

zhangjinpeng87 Jun 11, 2022

BusyJay Jun 12, 2022

BusyJay Jun 12, 2022

BusyJay commented Jun 12, 2022

ekexium Jun 15, 2022

BusyJay Jun 15, 2022


		Because all existing cfs are updated just as before, so there are no major compatibility issues.

		But using latest cf should be triggered explicitly by client. Client should ensure only when it updates all keys with latest cf will it ask TiKV to query using latest cf.


		## Alternatives

		unistore separates the latest version and other versions by adjust file format. So when flushing or compacting, it will make latest versions key be the first part, and the rest in the second part. This approach doesn't have write overhead, but is not backward compatible in TiKV.


		As explained, the reason why seek is necessary is because TiKV has no idea what versions are available. Otherwise it can just use `get` to query the specific key, which is a lot cheaper.

		## Detailed design


		TiKV doesn't need to know all existing versions of keys. In fact, most of the time, v0 is larger than any existing versions of keys in TiKV if there is more read than write. So it should be enough to just let TiKV knows the latest version of all keys.

		The RFC propose to add a new cf named "latest". When a key is inserted using transaction API, it should update write cf (and default cf) as current does. In addition, it also insert a key to latest cf with

	Inserting v1 again with version v1 and value bar will insert two keys:
	Inserting k1 again with version v1 and value bar will insert two keys:

introduce latest cf #95

Are you sure you want to change the base?

introduce latest cf #95

Conversation

BusyJay commented Jun 5, 2022 • edited by AndreMouche

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BusyJay Jun 5, 2022 • edited

Choose a reason for hiding this comment

BusyJay Jun 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BusyJay commented Jun 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BusyJay commented Jun 5, 2022 •

edited by AndreMouche

BusyJay Jun 5, 2022 •

edited

BusyJay Jun 5, 2022 •

edited