Skip to content

Latest commit

 

History

History
67 lines (54 loc) · 1.81 KB

blame.md

File metadata and controls

67 lines (54 loc) · 1.81 KB

Blame

In order to serve up blame information quickly, mozsearch caches blame data for every version of every file in a repository. Here's how it works:

Suppose a file has three revisions:

rev1:
abc
def
ghi

rev2:
abc
def2
ghi

rev3:
abc
ghij

Then mozsearch will generate three "blame revisions" in a new repository:

blame-rev1:
rev1
rev1
rev1

blame-rev2:
rev1
rev2
rev1

blame-rev3:
rev1
rev3

Every revision in the original repository has a corresponding blame revision. The blame version of the file will have one line for every line in the original file. This line will contain the revision ID of the revision in the original repository that introduced that line.

This data is stored in its own git repository. This repository, called the blame repository, has exactly the same file structure as the original repository. Each commit in this repository corresponds to a commit in the original repository. Commit messages in the blame repository give the revision they correspond to in the original repository.

Let's imagine you want to get blame for a file ${f} at revision ${rev}. First, find the revision ${blame_rev} in the blame repository that corresponds to ${rev} (the web server keeps a mapping between revisions in memory for this purpose). Then find ${f} in the blame repository at revision ${blame_rev}. Finally, show these two files side-by-side and you're done.

Mozsearch uses the tools/src/bin/build-blame tool to generate a blame repository. Generating cached blame information is pretty slow. However, each indexing run only needs to generate new blame revisions for each new revision that has appeared in the original repository since the last indexing. Typically the blame repository is about the same size as the original repository since it compresses very well with git's delta compression.