Fix slowdown for LatestChunkStateWitnesses #11258

robin-near · 2024-05-08T04:02:47Z

        for item in self.store().iter(DBCol::LatestChunkStateWitnesses) {
            if info.is_within_limits() {
                break;
            }

            let (key_bytes, witness_bytes) = item?;
            store_update.delete(DBCol::LatestChunkStateWitnesses, &key_bytes);
            info.count -= 1;
            info.total_size -= witness_bytes.len() as u64;
        }

For some reason, this loop is very slow relative to other code in neard. It hogs 100% CPU and slows everything else to a crawl.

Not sure why, but we should investigate. I think at least, we can avoid iterating at all if the limit is not exceeded in the first place. Ideally, we should not be iterating, but using a deterministic point-wise algorithm such as the one used in DelayedReceipts.

The text was updated successfully, but these errors were encountered:

jancionear · 2024-05-08T09:39:32Z

Ah crap, I'll take a look at it. Maybe store().iter() loads a whole page of results? 0_o

I think at least, we can avoid iterating at all if the limit is not exceeded in the first place.

The set of saved witnesses is always almost full, so most of the time the info will be exceeded, I think it won't help much.

Ideally, we should not be iterating, but using a deterministic point-wise algorithm such as the one used in DelayedReceipts.

👀

jancionear · 2024-05-13T13:02:47Z

Ideally, we should not be iterating, but using a deterministic point-wise algorithm such as the one used in DelayedReceipts.

The problem is that we'd like to be able to query for a witness with specific height and shard_id, this is difficult to do efficiently with an algorithm like the one for DelayedReceipts. I guess we could scan everything and print out the desired witnesses, but scanning 4GB of data could take a moment 0_o

jancionear · 2024-05-13T21:16:35Z

Maybe store().iter() loads a whole page of results? 0_o

I tried to set iteration readahead to 4KB, but iter() still takes like 100ms 0_o. Weird stuff...

read_options.set_readahead_size(4096);

Fixes: #11258 Changes in this PR: * Improved observability of saving latest witnesses * Added metrics * Added a tracing span, which will be visible in span analysis tools * Added a printout in the logs with details about saving the latest witness * Fixed the extreme slowness of `save_latest_chunk_state_witness`, the new solution doesn't iterate anything * Start saving witnesses produced during shadow validation, I needed that to properly test the change The previous solution used `store().iter()` to find the witness with the lowest height that needs to be removed to free up space, but it turned out that this takes a really long time, ~100ms! The new solution doesn't iterate anything, instead of that it maintains a mapping from integer indexes to saved witnesses. So the first observed witness gets index 0, the second one gets 1, third gets 2, and so on... When it's time to free up space we delete the witness with the lowest index. We maintain two pointers to the ends of this "queue", and move them accordingly when the witnesses are removed and added. This greatly improves the time needed to save the latest witness - with new code generating the database update usually takes under 1ms, and commiting it takes under 6ms (on shadow validation): ![image](https://github.com/near/nearcore/assets/149345204/06f379d3-1a36-4aa0-8c5f-043bab7bc36c) ([view the metrics here](https://nearone.grafana.net/d/admakiv9pst8gd/save-latest-witnesses-stats?orgId=1&var-chain_id=mainnet&var-node_id=jan-mainnet-node&var-shard_id=All&from=1716234291000&to=1716241491000)) ~7ms is still a non-negligible amount of time, but it's way better than the previous ~100ms. It's a debug only feature, so 7ms might be acceptable.

jancionear self-assigned this May 8, 2024

jancionear mentioned this issue May 20, 2024

Fix slow saving of latest witnesses #11354

Merged

jancionear closed this as completed in #11354 May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix slowdown for LatestChunkStateWitnesses #11258

Fix slowdown for LatestChunkStateWitnesses #11258

robin-near commented May 8, 2024

jancionear commented May 8, 2024

jancionear commented May 13, 2024

jancionear commented May 13, 2024

Fix slowdown for LatestChunkStateWitnesses #11258

Fix slowdown for LatestChunkStateWitnesses #11258

Comments

robin-near commented May 8, 2024

jancionear commented May 8, 2024

jancionear commented May 13, 2024

jancionear commented May 13, 2024