New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slowdown for LatestChunkStateWitnesses #11258
Comments
Ah crap, I'll take a look at it. Maybe
The set of saved witnesses is always almost full, so most of the time the info will be exceeded, I think it won't help much.
👀 |
The problem is that we'd like to be able to query for a witness with specific |
I tried to set iteration readahead to 4KB, but read_options.set_readahead_size(4096); |
Fixes: #11258 Changes in this PR: * Improved observability of saving latest witnesses * Added metrics * Added a tracing span, which will be visible in span analysis tools * Added a printout in the logs with details about saving the latest witness * Fixed the extreme slowness of `save_latest_chunk_state_witness`, the new solution doesn't iterate anything * Start saving witnesses produced during shadow validation, I needed that to properly test the change The previous solution used `store().iter()` to find the witness with the lowest height that needs to be removed to free up space, but it turned out that this takes a really long time, ~100ms! The new solution doesn't iterate anything, instead of that it maintains a mapping from integer indexes to saved witnesses. So the first observed witness gets index 0, the second one gets 1, third gets 2, and so on... When it's time to free up space we delete the witness with the lowest index. We maintain two pointers to the ends of this "queue", and move them accordingly when the witnesses are removed and added. This greatly improves the time needed to save the latest witness - with new code generating the database update usually takes under 1ms, and commiting it takes under 6ms (on shadow validation): ![image](https://github.com/near/nearcore/assets/149345204/06f379d3-1a36-4aa0-8c5f-043bab7bc36c) ([view the metrics here](https://nearone.grafana.net/d/admakiv9pst8gd/save-latest-witnesses-stats?orgId=1&var-chain_id=mainnet&var-node_id=jan-mainnet-node&var-shard_id=All&from=1716234291000&to=1716241491000)) ~7ms is still a non-negligible amount of time, but it's way better than the previous ~100ms. It's a debug only feature, so 7ms might be acceptable.
For some reason, this loop is very slow relative to other code in neard. It hogs 100% CPU and slows everything else to a crawl.
Not sure why, but we should investigate. I think at least, we can avoid iterating at all if the limit is not exceeded in the first place. Ideally, we should not be iterating, but using a deterministic point-wise algorithm such as the one used in DelayedReceipts.
The text was updated successfully, but these errors were encountered: