Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use separate thread to compress block store #1389

Merged
merged 12 commits into from Jun 23, 2022
Merged

use separate thread to compress block store #1389

merged 12 commits into from Jun 23, 2022

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Jun 16, 2022

Use seperate thread to compress block store for increased indexing performance. This allows to use slower compressors with higher compression ratio, with less or no perfomance impact (with enough cores).

A seperate thread is spawned to compress the docstore, which handles single blocks and stacking from other docstores.
The spawned compressor thread does not write, instead it sends back the compressed data. This is done in order to avoid writing multithreaded on the same file.

Small benchmark 1GB hdfs, zstd level 8

Pre
 Total Nowait Merge: 43.30 Mb/s
 Total Wait Merge: 43.28 Mb/s

Post
Total Nowait Merge: 67.69 Mb/s
Total Wait Merge: 67.69 Mb/s

@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2022

Codecov Report

Merging #1389 (4b6db03) into main (83d0c13) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1389      +/-   ##
==========================================
+ Coverage   94.29%   94.30%   +0.01%     
==========================================
  Files         236      236              
  Lines       43418    43471      +53     
==========================================
+ Hits        40942    40997      +55     
+ Misses       2476     2474       -2     
Impacted Files Coverage Δ
common/src/writer.rs 94.11% <ø> (ø)
src/indexer/merger.rs 98.97% <100.00%> (ø)
src/indexer/segment_serializer.rs 98.07% <100.00%> (-0.04%) ⬇️
src/indexer/segment_writer.rs 96.40% <100.00%> (-0.01%) ⬇️
src/store/index/mod.rs 97.83% <100.00%> (ø)
src/store/index/skip_index_builder.rs 100.00% <100.00%> (ø)
src/store/mod.rs 99.17% <100.00%> (ø)
src/store/writer.rs 100.00% <100.00%> (+1.08%) ⬆️
src/schema/facet.rs 89.88% <0.00%> (-0.06%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83d0c13...4b6db03. Read the comment docs.

src/store/writer.rs Outdated Show resolved Hide resolved
) -> io::Result<StoreWriter> {
let thread_builder = thread::Builder::new().name("docstore compressor thread".to_string());

// Data channel to send fs writes, to write only from current thread
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting... why do we want to write only from current thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though in tantivy we create a separate file, which is fine for a separate thread to write into, the Directory trait itself doesn't require explicitly thread safe writes. TerminatingWrite is not Send, so the current contract is that the writers stay on the same thread.

It's unlikely to be an issue, but it could be two threads write next to each other at the same page or cache line. So depending on the write target and buffer synchronization between the threads, it could be that they overwrite each others data. Since rust doesn't cover race conditions outside it's memory model, e.g. Files, I'm extra careful there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Can we make TerminatingWrite: Send and simplify the code though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so you just offload compression to a different thread. Writing is still done in the same place. Maybe it is clearer that way let me keep on reading on.

src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
@PSeitz PSeitz requested a review from fulmicoton June 16, 2022 13:33
src/store/writer.rs Outdated Show resolved Hide resolved
@fulmicoton
Copy link
Collaborator

fulmicoton commented Jun 17, 2022

A bit of analyis...

We write the docstore
1- when building a new segment
2a- when merging N segments. If we do no have any deletes in the segment being appended we have an operation called stacking that makes it possible to more or less concatenate teh docstore (it is a tiny bit more complicate that but what ever)
2b- If we have deletes, then we need to rebuild the docstore. the operation is then close to what happens in 1.

Threading can help with 2 things
A docstore compression is heavy and does not need to happen on the same thread as inverted index building. We love offloading stuff like this because it increases the indexing throughput without producing smaller segments.
B we do not need to have the CPU wait on IO.

The current approach implementation will not help with B, because the IO still happens in the same thread as the original.
2a is all about B. This is probably more important when using the SSTable implementation.
1 and 2b is probably more about A. (we still io wait but more time is spent on CPU than on IO when building a segment. The gain is probably not negligible though).

2a and 2b are not as important because they do starve us on resource, and they do not impact time to search. (Finishing a merge quickly by using more cores is not very important). They could help resource management on quickwit however. (If all task take clearly one full thread, we can more easily rely on our own scheduling and size our task thread pool by the number of cores).

I think we want to at least do the IO on the thread that does the compression.
(That means moving the File into the thread).

@fulmicoton
Copy link
Collaborator

Please test on a dataset that has trigger merges (wikipedia is fine), and rely on the sstable dict.

Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on the Conversation tab.
#1389

@fulmicoton fulmicoton changed the title use seperate thread to compress block store use separate thread to compress block store Jun 17, 2022
@PSeitz
Copy link
Contributor Author

PSeitz commented Jun 17, 2022

I tested merge operation on hdfs 14GB(2,4GB index size, 4 segments) and wikipedia 8GB (5,7GB index size, 5 segment) with sstable. It seems to be marginal faster. Merge throughput is ~50MB/s hdfs and 75MB/s wikipedia (measured on index size, not input size), which is slower than most disks. On indexing, no noticeable speed was observed. I think the impact is too small to change the API (adding Send to TerminatingWrite). Although another upside would be that the code would be simpler.

➜  tantivy-cli git:(main) ✗ du -sh hdfs/
2,4G    hdfs/

➜  tantivy-cli git:(main) ✗ du -sh wikipedia
5,7G    wikipedia
Run Write on seperate thread Write on one thread
hdfs 1. run 50.43 secs 52.63 secs
hdfs 2. run 48.38 secs 50.57 secs
wikipedia 1. run 71.74 secs 77.27 secs
wikipedia 2. run 79.46 secs 77.25 secs
wikipedia 3. run 81.64 secs

@fulmicoton
Copy link
Collaborator

@PSeitz Thanks for investigating! That makes sense. We don't flush or anything, so the write are just pushing the data to OS buffer, and the actual write to disk will be done asynchronously by the OS, provided our throughput does not beat the hardware.
The docstore actually write a lot of data but after compression it is fine.
I suspect you would see different results with a lesser hard drive like EBS (gp2 is 250 MiB/s - to be split between writing a split and merging at the same time etc.).

Anyway, can you move the write to a different thread, if only to simplify the code?

src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
}

/// Flushes current uncompressed block and sends to compressor.
fn send_current_block_to_compressor(&mut self) -> io::Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can return the SendError directly here.
See discussion on call site.

let start_shift = self.writer.written_bytes() as usize;
pub fn stack(&mut self, store_reader: StoreReader) -> io::Result<()> {
// We flush the current block first before stacking
self.send_current_block_to_compressor()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both errors are actually errors on the compressing thread.

If there is an error, could we join the companion thread and return

  • its io::Error if it has one
  • a custome io::Error if it panicked.

The code that join/harvest the error could be factorized in an independant method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good idea, I don't like the error handling here currently, it's not deterministic which error is returned .. but to join the thread we need to consume self. We could swap it with another handle or put it in an option, I don't really like either of those

self.send_current_block_to_compressor()?;
drop(self.compressor_sender);

self.compressor_thread_handle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
src/store/writer.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but please have a look at the change suggeston and the error handling suggestion. The latter is very optional or can be done later.

PSeitz and others added 12 commits June 23, 2022 15:34
Use seperate thread to compress block store for increased indexing performance. This allows to use slower compressors with higher compression ratio, with less or no perfomance impact (with enough cores).

A seperate thread is spawned to compress the docstore, which handles single blocks and stacking from other docstores.
The spawned compressor thread does not write, instead it sends back the compressed data. This is done in order to avoid writing multithreaded on the same file.
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants