Skip to content
This repository has been archived by the owner on Sep 23, 2023. It is now read-only.

Improve performance of Compressor vis-à-vis CompressorSequential #278

Open
AlexeyAkhunov opened this issue Jan 25, 2022 · 4 comments
Open
Assignees

Comments

@AlexeyAkhunov
Copy link
Contributor

Both can found in compress package. CompressorSequential has been written for optimal performance in a single thread. Compressor (formerly known as ParallelCompressor) is used for prototypes and experiments and is therefore aiming at utilising maximum resources to run prototypes faster.
But maintaining two variants of the same code is error prone. Aggregator (part of Erigon2 prototype) has been switched to Compressor (parallel compressor) and now it is runs slower. My suspicion is that parallel compressor is wasting a lot of time on dispatching work, scheduling and on extra memory allocations to make sure thread-safely. We would like to profile those areas and optimise them.

For more context, in production, it is likely we will run compressor in a SINGLE background thread. So it may not even need to spawn goroutines in that mode. Parallel mode would only be used for experiments and prototypes.

Beyond Erigon2 prototype, compressor is currently used to package block header and block body snapshots. Requirement there (as well as in Erigon2 prototype) that optimisations do not change the resulting compressed file. Also, regardless of number of workers, the resulting compressed file should be the same.
However, if we find an optimisation that requires change of the file format, we will definitely consider it!

@AskAlexSharov
Copy link
Collaborator

I just removed persistence of dictionary file #283
But performance issue still exists (I mean this issue is valid)

@AskAlexSharov
Copy link
Collaborator

Added creation of superstrings immediately - instead of writing to file first: by #284 . We still need to create uncompressedFile file - because we need read data twice (for reducedict). Sequential compresser also doing it. Need to add here same trick as in ETL - create uncompressedFile only when it > etl.BufferOptimalSize.

Performance issue still exists (I mean this issue is valid).

@AskAlexSharov
Copy link
Collaborator

Related to #302

@awskii awskii self-assigned this May 18, 2022
@AskAlexSharov
Copy link
Collaborator

related #651

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants