Skip to content

File Lifecycle in Tantivy & Garbage Collection

PSeitz edited this page Jul 16, 2021 · 2 revisions

The managed.json lists files that belong to tantivy.

A file created by tantivy and existing on the filesystem IS listed on managed.json. (the reciprocal is not true)

So before creating a new file we add the file to managed.json After deleting a file from the FS we delete the file from managed.json This logic is enforced by the ManagedDirectory.

The meta.json lists segments that are part of a tantivy index. Updating meta.json is the atomic operation that marks a segment to be ready for search.

All SegmentMeta instances alive in memory are registered in a segment meta inventory. Creating a segment meta object is required to create the files associated to a segment. Keeping a SegmentMeta instance alive is a way to protect its files from deletion. For instance, a segment merger creates a SegmentMeta for the new merge segment, and keeps it until publication. The GC (for instance triggered because another merge finished) will not remove the files of our ongoing merge.

Garbage collection works as follows: The list of files that can be subject to GC is read from managed.json The list of segments that are alive is computed from the segment meta inventory. The list of files belonging to the list of alive segments is computed

Clone this wiki locally