Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: reject new commands if memory quota exceeded (#16473) #16950

Open
wants to merge 3 commits into
base: release-7.5
Choose a base branch
from

Conversation

ti-chi-bot
Copy link
Member

@ti-chi-bot ti-chi-bot commented May 7, 2024

This is an automated cherry-pick of #16473

This cherry-pick rolls up three PRs:

  1. storage: refactor command marco and task #16440
  2. storage: reject new commands if memory quota exceeded #16473
  3. storage: add memory quota metrics #16482

They are intended to be merged together.

What is changed and how it works?

Issue Number: ref #16234

What's Changed:

Currently, TiKV rejects new writes in the transaction layer if its
pending write bytes exceed a default threshold of 100MB. However, this
approach falls short as the transaction layer transforms a write
request into a Command and executes it as a Future. Both Command and
Future incur memory overhead. Empirical results from tests reveal that
the memory usage of `kv_prewrite` is 20 times larger than its written
bytes.

This commit introduces a memory quota that restricts the transaction
layer's memory usage. This addition acts as a crucial safeguard, serving
as the last resort to prevent TiKV from OOM.

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Manual test
Test Details

The OOM issue in #16234 is hard to reproduce reliable, so I have to changes the default configs.

A single-node Cluster with the following configs.

TiKV:

[storage]
# Try not to limit concurrent tasks
scheduler-concurrency = 2097152
# Don’t let blockcache affect memory usage
[storage.block-cache]
capacity = "100MB"

TiDB:

lease = "600s"
token-limit = 100000000
[txn-local-latches]
enabled = false
SET GLOBAL tidb_txn_mode = 'optimistic';
SET GLOBAL tidb_enable_async_commit = off;
SET GLOBAL tidb_enable_1pc = off;

Workload:

# Prepare
mysql> create database tpcc1k;
/root/.tiup/components/bench/v1.12.0/go-tpc \
    tpcc prepare \
    -H 10.2.12.86 -P 31825 \
    -D tpcc1k --warehouses 1000 -T 500

# Run
while true; do { \
    /root/.tiup/components/bench/v1.12.0/go-tpc \
        tpcc run \
        -H 10.2.12.86 -P 31825 \
        -D tpcc1k --warehouses 1000 --time 4s -T 500 & \
    pid=$!; sleep 5; kill -9 $pid; \
} done;

TiKV Config Metrics
OOM if memory-quota is unlimited.
[storage]
# 128GB disables memory quota efficently.
memory-quota = "128GB"
image image
Does not OOM if memory-quota is configured properly.
[storage]
memory-quota = "128MB"
image

Release note

Fix an issue that txn scheduler may cause OOM if TiKV writes too slow.

Copy link
Contributor

ti-chi-bot bot commented May 7, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Connor1996

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

overvenus and others added 3 commits May 22, 2024 11:20
ref tikv#16234

* txn: refactor task into a module
* storage: refactor commands marco

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16234

Currently, TiKV rejects new writes in the transaction layer if its
pending write bytes exceed a default threshold of 100MB. However, this
approach falls short as the transaction layer transforms a write
request into a Command and executes it as a Future. Both Command and
Future incur memory overhead. Empirical results from tests reveal that
the memory usage of `kv_prewrite` is 20 times larger than its written
bytes.

This commit introduces a memory quota that restricts the transaction
layer's memory usage. This addition acts as a crucial safeguard, serving
as the last resort to prevent TiKV from OOM.

Signed-off-by: Neil Shen <overvenus@gmail.com>
ref tikv#16234

* Add a metric of scheduler memory quota.
* Add a metric of scheduler running commands.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@overvenus overvenus force-pushed the cherry-pick-16473-to-release-7.5 branch from b518b9d to 7e104f4 Compare May 22, 2024 04:00
Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Status: PR - There is already 1 approval label May 22, 2024
@overvenus
Copy link
Member

/test

Copy link
Contributor

ti-chi-bot bot commented May 22, 2024

@overvenus: The /test command needs one or more targets.
The following commands are available to trigger optional jobs:

  • /debug pull-unit-test

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Cherry pick PR approved by release team. release-note size/XXL status/LGT1 Status: PR - There is already 1 approval type/cherry-pick-for-release-7.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants