Improve performance of Compressor vis-à-vis CompressorSequential #278

AlexeyAkhunov · 2022-01-25T14:03:41Z

Both can found in compress package. CompressorSequential has been written for optimal performance in a single thread. Compressor (formerly known as ParallelCompressor) is used for prototypes and experiments and is therefore aiming at utilising maximum resources to run prototypes faster.
But maintaining two variants of the same code is error prone. Aggregator (part of Erigon2 prototype) has been switched to Compressor (parallel compressor) and now it is runs slower. My suspicion is that parallel compressor is wasting a lot of time on dispatching work, scheduling and on extra memory allocations to make sure thread-safely. We would like to profile those areas and optimise them.

For more context, in production, it is likely we will run compressor in a SINGLE background thread. So it may not even need to spawn goroutines in that mode. Parallel mode would only be used for experiments and prototypes.

Beyond Erigon2 prototype, compressor is currently used to package block header and block body snapshots. Requirement there (as well as in Erigon2 prototype) that optimisations do not change the resulting compressed file. Also, regardless of number of workers, the resulting compressed file should be the same.
However, if we find an optimisation that requires change of the file format, we will definitely consider it!

The text was updated successfully, but these errors were encountered:

AskAlexSharov · 2022-01-27T05:55:23Z

I just removed persistence of dictionary file #283
But performance issue still exists (I mean this issue is valid)

AskAlexSharov · 2022-01-27T06:57:11Z

Added creation of superstrings immediately - instead of writing to file first: by #284 . We still need to create uncompressedFile file - because we need read data twice (for reducedict). Sequential compresser also doing it. Need to add here same trick as in ETL - create uncompressedFile only when it > etl.BufferOptimalSize.

Performance issue still exists (I mean this issue is valid).

AskAlexSharov · 2022-02-04T08:41:17Z

Related to #302

AskAlexSharov · 2022-10-06T10:01:22Z

related #651

awskii self-assigned this May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of Compressor vis-à-vis CompressorSequential #278

Improve performance of Compressor vis-à-vis CompressorSequential #278

AlexeyAkhunov commented Jan 25, 2022

AskAlexSharov commented Jan 27, 2022

AskAlexSharov commented Jan 27, 2022

AskAlexSharov commented Feb 4, 2022

AskAlexSharov commented Oct 6, 2022

Improve performance of Compressor vis-à-vis CompressorSequential #278

Improve performance of Compressor vis-à-vis CompressorSequential #278

Comments

AlexeyAkhunov commented Jan 25, 2022

AskAlexSharov commented Jan 27, 2022

AskAlexSharov commented Jan 27, 2022

AskAlexSharov commented Feb 4, 2022

AskAlexSharov commented Oct 6, 2022