New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase WAL redo speed #1339
Comments
Right now I have implemented redo handler for only one WAL record: HEAP INSERT. But it allows to compare apply wal records speed. Configuration:
So we disable checkpointing and use small shared buffers to force page reconstruction. Vacuum is disabled to exclude background activity. Query:
Results:
So as you can see, redo in rust provides more than 2 times improvement of speed and 1/3 reduce of storage size (because more compact wal record format: we do not need to store information about target block). |
With prefetch effect of zenith wal redo is expected to be even larger. |
Related information concerning measuring speed of wal redo:
So in first case average WAL redo time is 160usec, in the second - 400usec. Apply one insert record by wal-redo postgres (eliminating all communication and xlog decoding overhead) is about 2usec. If we multiple it by 61, then get about 100 usec for applying all wal records needed to reconstruct one page. Others 300 usecs seems to be used by communication. |
By the way, time of |
I have performed series of experiments trying to determine walredo bottlenecks: First of all I dump requests to wal redo process in the file and then replayed them: I have implemented such way of multiplexing/buffering requests to walredo process through channels: Having pool of walredo workers (up to 4 processes) allows to reduce Q1 time from 30 to 18 seconds. |
This is small Rust program I am using to for replaying WAL from the file. It needs to be executed in the directory containing walredo.log and with PGDATA pointed to wal-redo directory. To produce
|
Interested in looking at this. |
I just realized that my redo_channel branch is not doing buffering in right way. |
So right now situation with Ketteq Q1 query is the following:
There is still large gap between cases when pageserver has to do page reconstruction (18 seconds) and when it is not needed (10 seconds). |
Average number of buffered requests for Q1 with 6 parallel workers is just 2.5 |
I am thinking what we would most benefit from right now is a reproducable benchmark around for example the Ketteq Q1 alike situation. Trying to work towards that, while trying to understand the #2778 as well. |
Tried quite a few permutations of implementing the pipelining walredo using tokio primitives in #2875 but it doesn't look viable, at least until the root cause have been understood. With tokio patched to support vectored writes, I feel like it should be faster and reuse the memory but for some reason it is slower overall. |
We are currently applying WAL records in separate process (wal-redo postgres), sending wal records through the pipe and receiving reconstructed pages as response.
It adds quite significant overhead.
Also received WAL records can be malformed, so we should place wal-redo process in sand-box to prevent hack of the system. Right now we are using seccomp to prohibit all "dangerous" system calls. But I am not sure that it is enough.
Also there is just one instance of wal-redo process per-tenant, so it can be a bottleneck. There were attempts to spawn pool of wal-redo postgreses. In some cases it have positive effect on performance, in some cases- negative. And if we are going to server larger number of tenants by one pageserver, then large number of wal-redo processes can be a problem.
So I want investigate how difficult it can be to reimplement postgres redo handlers in Rust and can it improve performance.
The text was updated successfully, but these errors were encountered: