single_block_lookups leak #5694

dapplion · 2024-05-02T16:06:15Z

Description

Looking at the metric sync_single_block_lookups on our nodes, they have 100k ~ 150k active lookups. The metric is properly implemented so this is a leak. Each lookup is quite small some hundreds of bytes so the leak is very slow and small overall.

A possible explanation is:

Create a new lookup for block A
Block A is already in the da_checker
lookup skips sending a block request because it's already in the da_checker
No need event for lookup is received, so it is never removed

Version

stable

Steps to resolve

Fixed with

The text was updated successfully, but these errors were encountered:

pawanjay176 · 2024-05-05T01:22:45Z

Seems like the leak is happening atleast partly due to #5680 (comment)

RPCError::Disconnect not propagating up to sync could lead to awaiting_parent.is_some() lookups never getting resolved which means that they never get removed from the lookups map.

I did some testing with propagating the disconnects to sync. Doing this seems to result in lookups getting removed and sync_single_block_lookups metric getting back to 0 once the node is synced.
Not propagating the disconnects (like its happening currently in cut-5.2.0) is consistently increasing the lookup size on local testing.

michaelsproul added bug Something isn't working v5.2.0 Q2 2024 labels May 3, 2024

dapplion mentioned this issue May 6, 2024

Release v5.2.0 #5664

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

single_block_lookups leak #5694

single_block_lookups leak #5694

dapplion commented May 2, 2024

pawanjay176 commented May 5, 2024

single_block_lookups leak #5694

single_block_lookups leak #5694

Comments

dapplion commented May 2, 2024

Description

Version

Steps to resolve

pawanjay176 commented May 5, 2024