Stale read for rawkv with read ts #96

iosmanthus · 2022-06-07T11:17:35Z

Signed-off-by: iosmanthus myosmanthustree@gmail.com

This pull request is based on the #80.

Rendered

Signed-off-by: iosmanthus <myosmanthustree@gmail.com>

BusyJay · 2022-06-15T18:17:53Z

text/0096-rawkv-stale-read.md

+
+TiKV currently supports **three** features to process read-only queries more efficiently. 
+
+1. Follower read.


You mean replica read?

Then correct name should be used.

BusyJay · 2022-06-15T18:21:09Z

text/0096-rawkv-stale-read.md

+}
+```
+
+2. While TiKV is handling radw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`.


Suggested change

2. While TiKV is handling radw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`.

2. While TiKV is handling raw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`.

BusyJay · 2022-06-15T18:22:26Z

text/0096-rawkv-stale-read.md

+
+```diff
+class RawKVClient {
+     ByteString rawGet(ByteString key, readTs: Timestamp)


I think it might be confusing for client to understand what is readTs in RawKV.

How about changing the time type to DataTime instead of using TimeStamp which might be more like the syntax of TiDB: https://docs.pingcap.com/tidb/dev/as-of-timestamp#syntax

BusyJay · 2022-06-15T18:23:58Z

text/0096-rawkv-stale-read.md

+
+### TiKV
+
+ While trying to read data, clients should specify a timestamp which attachs to the request header as `read_ts`, typically a timestamp few seconds ago. The replica should read the local storage with the `read_ts` and reuses the mechanism from the stale read of TxnKV. This requires the replica to check the `read_ts` against the `safe_ts` which is advaneced by `CheckLeader` message from the store of the leader or `resolve-ts` worker. As long as the `safe_ts` is no less than `read_ts`, the replica is allowed to read the key from local storage.


How is safe_ts maintained in RawKV since there is no locks?

If there are no locks, the resolve_ts will advance the safe_ts by requesting the TSO for a timestamp periodically. The default config for the resolve_ts worker is 1s.

More details are supplemented.

Signed-off-by: iosmanthus <myosmanthustree@gmail.com>

…thus/tikv-rfcs into stale-read-for-rawkv-with-read-ts

BusyJay · 2022-06-24T21:03:46Z

text/0096-rawkv-stale-read.md

+
+TiKV currently supports **three** features to process read-only queries more efficiently. 
+
+1. Follower read.


Then correct name should be used.

BusyJay · 2022-06-24T21:04:55Z

text/0096-rawkv-stale-read.md

+
+1. Follower read.
+
+    Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to it to the client. This feature helps distribute the read stress on the leader but still increases the read latency. 


Suggested change

Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to it to the client. This feature helps distribute the read stress on the leader but still increases the read latency.

Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to the client. This feature helps distribute the read stress on the leader but still increases the read latency.

BusyJay · 2022-06-24T21:11:27Z

text/0096-rawkv-stale-read.md

+
+The `read_ts` specified by the client could be acquired by the following ways:
+
+1. Calculate a timestamp from the physical time from the local. The `read_ts` might suffer from the clock drift and exceed the max timestamp allocated from TSO. The client will fail to read any data even if that target replica is the leader since the `safe_ts` of the replica don't catch up with the `read_ts`. **Deploying NTP services** in the cluster might mitigate this issue.


Can it be 0?

We can preserve this value for the unbound stable read: read the latest data without checking safe_ts.

Then how to preserve compatibility?

Connor1996 · 2022-06-27T02:45:12Z

text/0096-rawkv-stale-read.md

+
+The `read_ts` specified by the client could be acquired by the following ways:
+
+1. Calculate a timestamp from the physical time from the local. The `read_ts` might suffer from the clock drift and exceed the max timestamp allocated from TSO. The client will fail to read any data even if that target replica is the leader since the `safe_ts` of the replica don't catch up with the `read_ts`. **Deploying NTP services** in the cluster might mitigate this issue.


if read_ts exceeds the max timestamp allocated from TSO, maybe we can just return the latest data instead of no data.

Then the read_ts will lost its restriction to the data freshness since some very stale replicas might be chosen.

pingyu · 2022-06-28T03:35:41Z

text/0096-rawkv-stale-read.md

+
+### TiKV
+
+ While trying to read data, clients should specify a timestamp which attachs to the request header as `read_ts`, typically a timestamp few seconds ago. The replica should read the local storage with the `read_ts` and reuses the mechanism from the stale read of TxnKV. This requires the replica to check the `read_ts` against the `safe_ts` which is advaneced by `CheckLeader` message from the store of the leader (for follower) or `resolve-ts` worker (for leader). As long as the `safe_ts` is no less than `read_ts`, the replica is allowed to read the key from local storage. Notice that there is no lock for the RawKV regions, thus the `resolve-ts` worker advanced the `safe_ts` by requesting the TSO for the latest timestamp.


What's the meaning of safe_ts for RawKV ? Suggest to give a definition, e.g, "the minimum timestamp of the on-the-fly RawKV writes", or "all writes before safe_ts can be read".

If the definition depends on "timestamp" of RawKV writes, this feature depends on the timestamp introduced by API V2, is that right ?

The mechanism to get the "minimal timestamp" of the on-the-fly writes between Txn & Raw would be quite different. Although there is no locks, Raw writes would still be "on-the-fly" during Raft procedure.

RawKV CDC faces a very similar problem to track "on-the-fly" for resolved-ts. I think we can reuse it for stale read. Please refer to RawKV Change Data Capture #86 .

BusyJay · 2022-08-31T06:38:50Z

There is a special case that user may choose availability rather than consistency. So client is OK to read with any ts, that is just return what the replica has currently. In this RFC, it seems keys with larger ts may be skipped during read.

iosmanthus · 2022-08-31T17:12:11Z

There is a special case that user may choose availability rather than consistency. So client is OK to read with any ts, that is just return what the replica has currently. In this RFC, it seems keys with larger ts may be skipped during read.

This RFC doesn't depend on the keys' timestamp, the underlying storage could have no information about the timestamp. The safe_ts is like the (approximate) timestamp of the leader that writes the key. The read_ts tends to be checked against the safe_ts to guarantee the replicas have already been synced with those writes that happened around safe_ts. To read any data, we could specify the read_ts to 0, and then any replicas with safe_ts > 0 could handle the request.

BusyJay · 2022-09-01T02:01:14Z

I'm OK with 0 timestamp. Currently, txn stale read consider ts 0 an error. And client (like TiDB) may actually send ts 0 by mistake. This RFC should state clear what 0 means in rawkv and implementation should not break compatibility.

iosmanthus added 2 commits June 7, 2022 19:16

stale read for rawkv with read ts

d5924de

Signed-off-by: iosmanthus <myosmanthustree@gmail.com>

rename rfcs

e759b63

Signed-off-by: iosmanthus <myosmanthustree@gmail.com>

iosmanthus requested review from BusyJay and sunxiaoguang June 7, 2022 13:48

add more client defails

6203c46

Signed-off-by: iosmanthus <myosmanthustree@gmail.com>

iosmanthus requested review from sticnarf and Connor1996 June 10, 2022 16:14

Merge branch 'master' into stale-read-for-rawkv-with-read-ts

1065507

BusyJay reviewed Jun 15, 2022

View reviewed changes

iosmanthus added 2 commits June 16, 2022 14:42

fix typos and add the maintain logic of safe_ts

5b588bb

Signed-off-by: iosmanthus <myosmanthustree@gmail.com>

Merge branch 'stale-read-for-rawkv-with-read-ts' of github.com:iosman…

0ab9d05

…thus/tikv-rfcs into stale-read-for-rawkv-with-read-ts

BusyJay reviewed Jun 24, 2022

View reviewed changes

Connor1996 reviewed Jun 27, 2022

View reviewed changes

pingyu reviewed Jun 28, 2022

View reviewed changes

pingyu mentioned this pull request Jun 29, 2022

br: Return the timestamp of backup to cooperate with TiKV-CDC tikv/migration#138

Closed

haojinming mentioned this pull request Aug 15, 2022

Encode causal timestamp in storage module for RawKV APIV2 tikv/tikv#13284

Closed

3 tasks

Merge branch 'master' into stale-read-for-rawkv-with-read-ts

81b7a56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale read for rawkv with read ts #96

Stale read for rawkv with read ts #96

iosmanthus commented Jun 7, 2022 •

edited

BusyJay Jun 15, 2022

iosmanthus Jun 16, 2022

BusyJay Jun 24, 2022

BusyJay Jun 15, 2022

BusyJay Jun 15, 2022

iosmanthus Jun 16, 2022 •

edited

BusyJay Jun 15, 2022

iosmanthus Jun 16, 2022

iosmanthus Jun 16, 2022

BusyJay Jun 24, 2022

BusyJay Jun 24, 2022

BusyJay Jun 24, 2022

iosmanthus Jun 28, 2022

BusyJay Jun 28, 2022

Connor1996 Jun 27, 2022

iosmanthus Jun 28, 2022

pingyu Jun 28, 2022 •

edited

BusyJay commented Aug 31, 2022 •

edited

iosmanthus commented Aug 31, 2022 •

edited

BusyJay commented Sep 1, 2022


		TiKV currently supports three features to process read-only queries more efficiently.

		1. Follower read.

	2. While TiKV is handling radw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`.
	2. While TiKV is handling raw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`.


		### TiKV

		While trying to read data, clients should specify a timestamp which attachs to the request header as `read_ts`, typically a timestamp few seconds ago. The replica should read the local storage with the `read_ts` and reuses the mechanism from the stale read of TxnKV. This requires the replica to check the `read_ts` against the `safe_ts` which is advaneced by `CheckLeader` message from the store of the leader or `resolve-ts` worker. As long as the `safe_ts` is no less than `read_ts`, the replica is allowed to read the key from local storage.


		1. Follower read.

		Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to it to the client. This feature helps distribute the read stress on the leader but still increases the read latency.


		The `read_ts` specified by the client could be acquired by the following ways:

		1. Calculate a timestamp from the physical time from the local. The `read_ts` might suffer from the clock drift and exceed the max timestamp allocated from TSO. The client will fail to read any data even if that target replica is the leader since the `safe_ts` of the replica don't catch up with the `read_ts`. Deploying NTP services in the cluster might mitigate this issue.

Stale read for rawkv with read ts #96

Are you sure you want to change the base?

Stale read for rawkv with read ts #96

Conversation

iosmanthus commented Jun 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iosmanthus Jun 16, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pingyu Jun 28, 2022 • edited

Choose a reason for hiding this comment

BusyJay commented Aug 31, 2022 • edited

iosmanthus commented Aug 31, 2022 • edited

BusyJay commented Sep 1, 2022

iosmanthus commented Jun 7, 2022 •

edited

iosmanthus Jun 16, 2022 •

edited

pingyu Jun 28, 2022 •

edited

BusyJay commented Aug 31, 2022 •

edited

iosmanthus commented Aug 31, 2022 •

edited