Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse entry indices #309

Open
v0y4g3r opened this issue Jun 1, 2023 · 3 comments
Open

Sparse entry indices #309

v0y4g3r opened this issue Jun 1, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@v0y4g3r
Copy link

v0y4g3r commented Jun 1, 2023

In #303 we can see some users complain about the memory usage.

The memory footprint of entry indices largely depends on the average size of entries. File purge and compaction are based on log file total size. The smaller the avg entry size is, the more entry indices will reside in memory.

In some cases, point-get of log entries is less frequent than bulk loads that reads a range of entries and applies those entries to some state machine. So we can use sparse indices to further reduce the memory footprint of raft-engine, for example, we can bookkeep the entry id to offset mapping every N entries thus the memory consumption can be reduced to 1/N.

Sparse indices means we can no longer get the exact offset of some arbitrary entry id, the overhead of reading those needless entries can be amortized in bulk load cases.

@tabokie
Copy link
Member

tabokie commented Jun 1, 2023

It's a good idea. I already have a PR that merge the file location field of multiple entries into one u32 (#208). We just need to move the entry offset into it as well.

The reason I didn't merge that PR is because it apparently causes regression to both memory usage and performance in some corner cases (e.g. user writes only one entry in a log batch). I prefer we can at least make it gated by a feature.

@v0y4g3r
Copy link
Author

v0y4g3r commented Jun 1, 2023

It's a good idea. I already have a PR that merge the file location field of multiple entries into one u32 (#208). We just need to move the entry offset into it as well.

The reason I didn't merge that PR is because it apparently causes regression to both memory usage and performance in some corner cases (e.g. user writes only one entry in a log batch). I prefer we can at least make it gated by a feature.

Glad to see that feature is on the road map. Maybe we can list the work needs to do and I'd be happy if I can help.

We're going to build a quorum-based log storage like Aurora's storage service. I think raft-engine should be suitable as the local storage.

@tabokie
Copy link
Member

tabokie commented Jun 9, 2023

@v0y4g3r Sorry for the delay. To move forward this feature, I'll first need to (1) polish #208 with feature gate, then (2) you can propose a change that fully embeds entry offset as well. Because we don't currently need this feature in TiKV, I would anticipate the review process could take a fair amount of time before we reach step 2.

I'm curious though, is this a real problem you encountered while using the engine? For the case you mentioned ("The smaller the avg entry size is, the more entry indices will reside in memory."), we can also add another memory metrics other than the existing file size metrics purge_threshold to speed up the purge. In reality it also makes more sense for user application to control the total Raft log entry count rather than total Raft log size.

@tabokie tabokie added the enhancement New feature or request label Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants