Symbolic information binary format #2926

kolesnikovae · 2024-01-16T08:04:54Z

Pyroscope stores symbolic information such as locations, functions, mappings, and strings in column-major order, in parquet format. We define schema dynamically, and have hand-written costruct/deconstruct procedures for each of the models. While it gives us a simple and convenient way to manage and maintain the storage schema, the approach has its own disadvantages:

We always read all the model fields/columns. In the meantime, read/write buffers are allocated for each of the columns, which causes excessive IO and resource usage.
Fairly expensive decoding (~5-7% of the query CPU time).
Read amplification caused by the fact that a partition can overlap parquet column chunk page boundaries.
Despite the small size of the payload, fetching of the partitions is often responsible for tail latencies. The impact is even more pronounced on downsampled/aggregated data.

In the screenshot below you can see that a parquetTableRange.fetch call lasted for 3 seconds (with no good reason – probably it was blocked by async page reader that is shared with profile table reader):

I propose to develop a custom binary format and low level encoders and decoders for the data models. The data should be organised in row-major order. I expect that it will effectively remove symbolic data retrieval from the list of query latency factors.

The text was updated successfully, but these errors were encountered:

cyriltovena · 2024-01-16T15:38:49Z

Definitively agree to aim at reducing IO for symbols but I think it's not just parquet it seems stracktraces.symdb is also causing tail latency.

kolesnikovae self-assigned this Jan 16, 2024

kolesnikovae added storage Low level storage matters performance If there's anything we have to be really good at it's this labels Jan 16, 2024

kolesnikovae linked a pull request Apr 29, 2024 that will close this issue

feat: symdb custom binary format #3138

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Symbolic information binary format #2926

Symbolic information binary format #2926

kolesnikovae commented Jan 16, 2024 •

edited

cyriltovena commented Jan 16, 2024

Symbolic information binary format #2926

Symbolic information binary format #2926

Comments

kolesnikovae commented Jan 16, 2024 • edited

cyriltovena commented Jan 16, 2024

kolesnikovae commented Jan 16, 2024 •

edited