feat: improved shard cache #7429

Longarithm · 2022-08-17T16:46:45Z

Improve shard cache to use RAM more effectively.

Three changes are introduced:

If we put new value to LRU cache and total size of existing values exceeds total_sizes_capacity, we evict values from it until that is no longer the case. So the actual total size should never exceed total_size_limit + TRIE_LIMIT_CACHED_VALUE_SIZE.
We add this because value sizes generally vary from 1 B to 500 B and we want to count cache size precisely. The current value size limit is 1000 B, so for average size of 100 B we use shard cache 10x more effectively.
When we save trie changes, we previously just applied insertions to the shard cache - which means that we added newly created nodes to it. Deletions were applied only during GC of the old block. Now we apply deletions and call pop for shard cache during saving trie changes of a new block as well.
This helps to use shard cache space more effectively. Previously nodes from the old state could occupy a lot of space which led to eviction of nodes from the fresh state.
If shard cache pop is called, item is not deleted but put to the deletions queue with deletions_queue_capacity first. If popped item doesn't fit in the queue, the last item is removed from the queue and LRU cache, and newly popped item is inserted to the queue.
It is needed to delay removals when we have forks. In simple case, two blocks may share a parent P. When we process the first block, we call pop for some nodes from P, but when we process the second block, we may need to read some nodes from P as well. Now we delay removal by 100_000, which helps to keep all nodes from 3 completely full last blocks.

Next steps:

general cleanup after getting it merged;
make new constants configurable, similarly to trie cache capacity

We want to get the whole update merged by next Wednesday, and cherry-pick it to 1.28 and 1.29 releases. This is not a protocol change, so it doesn't require a separate release or protocol version.

Testing

Tests for BoundedQueue which keep the queue of trie deletions
Tests for TrieCache which logic is less trivial now
nayduck: https://nayduck.near.org/#/run/2619

core/store/src/trie/trie_storage.rs

…ache-1

matklad

I don't see anything obviously wrong now! This is fiddly, so a second pair of eyes would help (and @mm-near has an outstanding NO review).

I see there are nayduck failures in the run. Are those fixed at the latest version of the code?

matklad · 2022-08-24T12:15:26Z

chain/chain/src/store.rs

+        // Convert trie changes to database ops for trie nodes.
+        // Create separate store update for deletions, because we want to update cache and don't want to remove nodes
+        // from the store.
+        let mut deletions_store_update = self.store().store_update();


This is another quite questionable bit of code, we should think about how to make this clearer in a follow-up

Longarithm · 2022-08-24T13:30:22Z

I see there are nayduck failures in the run. Are those fixed at the latest version of the code?

They have been failing for a while, so it is not caused by this PR. I raised the question here https://near.zulipchat.com/#narrow/stream/295558-pagoda.2Fcore/topic/chunk_management.20test.20failure/near/294949074

core/store/src/config.rs

core/store/src/trie/trie_storage.rs

Co-authored-by: firatNEAR <firat@near.org> Co-authored-by: firatNEAR <102993450+firatNEAR@users.noreply.github.com>

Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in near#7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache.

* perf: avoid expensive lookup of storage metrics Finding the counter of a metric is a relatively expensive operation, when compared to just updating its value. A performance regression of about 20% on read perfromance has been observed due to additional metrics added in #7429. This commit stores the counters in `TrieCacheInne` and `TrieCachingStorag` to avoid the lookup in hot paths like when we hit the shard or chunk cache. * apply reviewer suggestions - group metrics together in two new structs - update comments

firatNEAR and others added 17 commits August 16, 2022 20:46

Introduce Storage Metrics for Prometheus

3f35f27

new shard cache

9aa35a7

minor fix

a1e4a62

Fix estimator

8080aa3

warning fixes

adf8034

separate deletions queue struct, fix push bug

a139dec

bounded queue tests

158db1a

fix pub

8534a85

move const

bf9b814

move const

0451f05

move const

2e53081

cache size limit

9cd170c

synctriecache test

81f6648

synctriecache test

01db7a9

test deletions queue removal

2258f8d

test cache capacity

54171d7

Merge branch 'master' into smart-cache-1

520fee9

Longarithm marked this pull request as ready for review August 19, 2022 14:48

Longarithm requested a review from a team as a code owner August 19, 2022 14:48

Longarithm requested a review from mina86 August 19, 2022 14:48

Longarithm self-assigned this Aug 19, 2022

Longarithm requested review from firatNEAR, matklad and mm-near August 19, 2022 14:51

Longarithm added the T-storage label Aug 19, 2022

Longarithm changed the title ~~[DO NOT MERGE] feat: improved shard cache~~ feat: improved shard cache Aug 19, 2022

mm-near suggested changes Aug 19, 2022

View reviewed changes

core/store/src/trie/trie_storage.rs Show resolved Hide resolved

core/store/src/trie/trie_storage.rs Show resolved Hide resolved

firatNEAR and others added 2 commits August 19, 2022 20:02

Fix comments

2b7f7bf

Merge branch 'feat/add_storage_metrics' into smart-cache-1

f1ced93

Longarithm mentioned this pull request Aug 19, 2022

feat/introduce cache related metrics for prometheus #7439

Closed

Longarithm and others added 2 commits August 23, 2022 21:48

fix assertion

1241305

Merge branch 'master' into smart-cache-1

bb75f49

Longarithm requested review from matklad and mm-near August 23, 2022 17:54

Longarithm added 2 commits August 24, 2022 01:00

put deletions to separate store update

824dfee

Merge branch 'smart-cache-1' of github.com:near/nearcore into smart-c…

bc37898

…ache-1

matklad approved these changes Aug 24, 2022

View reviewed changes

Longarithm and others added 2 commits August 26, 2022 18:14

update changelog

3c31f7c

Merge branch 'master' into smart-cache-1

95cd339

mm-near approved these changes Aug 26, 2022

View reviewed changes

Longarithm and others added 3 commits August 26, 2022 18:41

mention ram in changelog

470a063

4 -> 3

5666a84

Merge branch 'master' into smart-cache-1

40f5914

firatNEAR approved these changes Aug 26, 2022

View reviewed changes

core/store/src/config.rs Outdated Show resolved Hide resolved

core/store/src/trie/trie_storage.rs Show resolved Hide resolved

core/store/src/trie/trie_storage.rs Show resolved Hide resolved

Longarithm merged commit a8da6a0 into master Aug 26, 2022

Longarithm deleted the smart-cache-1 branch August 26, 2022 15:54

Longarithm added a commit that referenced this pull request Aug 26, 2022

feat: improved shard cache (#7429)

776efa3

Co-authored-by: firatNEAR <firat@near.org> Co-authored-by: firatNEAR <102993450+firatNEAR@users.noreply.github.com>

Longarithm added a commit that referenced this pull request Aug 26, 2022

feat: improved shard cache (#7429)

b5842b6

Co-authored-by: firatNEAR <firat@near.org> Co-authored-by: firatNEAR <102993450+firatNEAR@users.noreply.github.com>

Longarithm added a commit that referenced this pull request Aug 26, 2022

feat: improved shard cache (#7429)

5871d50

Co-authored-by: firatNEAR <firat@near.org> Co-authored-by: firatNEAR <102993450+firatNEAR@users.noreply.github.com>

firatNEAR linked an issue Aug 29, 2022 that may be closed by this pull request

Create metrics for shard/chunk cache part1 #7375

Closed

firatNEAR mentioned this pull request Aug 29, 2022

Create metrics for shard/chunk cache part1 #7375

Closed

jakmeier mentioned this pull request Aug 29, 2022

perf: avoid expensive lookup of storage metrics #7495

Merged

Longarithm mentioned this pull request Aug 29, 2022

Improve trie cache test coverage #7498

Open

jakmeier mentioned this pull request Sep 19, 2022

Tracking issue: Leftover after Sweatcoin launch #7634

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improved shard cache #7429

feat: improved shard cache #7429

Longarithm commented Aug 17, 2022 •

edited

matklad left a comment

matklad Aug 24, 2022

Longarithm commented Aug 24, 2022 •

edited

feat: improved shard cache #7429

feat: improved shard cache #7429

Conversation

Longarithm commented Aug 17, 2022 • edited

Testing

matklad left a comment

Choose a reason for hiding this comment

matklad Aug 24, 2022

Choose a reason for hiding this comment

Longarithm commented Aug 24, 2022 • edited

Longarithm commented Aug 17, 2022 •

edited

Longarithm commented Aug 24, 2022 •

edited