Skip to content
Jukka Lehtosalo edited this page Jun 30, 2022 · 6 revisions

If you run mypy using the mypy tool, it performs incremental checking by default. This tries to reuse type checking results from the previous run via files in the .mypy_cache directory.

This coarse-grained (module-level) incremental checking is often just fine, but it can result in a slow iteration speed, especially in large code bases with hundreds of thousands of lines or more.

The mypy daemon (dmypy) helps with this use case. It makes incremental runs very fast (often a few hundred ms) by having a long-lived process that stores program state in memory between runs.

The daemon also maintains fine-grained dependencies. These dependencies are between definitions, not modules. In contrast, the coarse-grained incremental mode only tracks dependencies at module level.

The implementation is fairly complex and it's usually not necessary to understand the details of how it works to work on mypy, but some mypy changes (e.g. many AST changes) require at least small updates in the mypy daemon.

Example

Assume that we have a file a.py that contains dozens of functions, only one of which uses module b:

# a.py

import b

...  # lots of stuff that doesn't use b

def f() -> None:
    b.g(1)

The mypy daemon will process the entire program when starting up, and record that a.f depends on b.g, among other things.

Now we edit b.py and change the signature of b.g:

# b.py

def g(x: str) -> None: ...

Mypy daemon will first notice that the file b.py is changed (by checking the modification time). It will process b.py again and notice that the only externally visible change in b is the modified signature of b.g.

Using fine-grained dependencies, mypy daemon will figure out that in module a, we'll need to reprocess only the function a.f. This is much faster than processing the entire module a if the module is big.

After reprocessing a.f, mypy daemon will report an incompatible argument error.

How does it work

The mypy daemon has several things going on, summarized below.

Fine-grained dependencies

Mypy daemon keeps track of fine-grained dependencies between functions, attributes, classes, modules, etc.

The smallest unit of (re)processing (i.e. target) is either a module top level or a top-level function/method. Class bodies are considered to be included in the surrounding module top level or function, but methods are separate targets. We only track dependencies at this level of granularity, though targets can depend on arbitrary attributes (that is, triggers or sources are more fine-grained than targets).

For example, if function mod1.f has a reference to class attribute mod2.C.x, we'll record that mod1.f depends on mod2.C.x. However, if the definition of class attribute C.x is initialized to the value CONST, we'll record that the top-level of mod2 depends on CONST, instead of recording a dependency for the class variable mod2.C.x.

Processing modified files

Each file that is modified since the previous mypy daemon run will be processed in full, since mypy doesn't have an incremental parser.

After processing a file, the daemon takes a diff of the old and new symbol tables to find changed definitions. It will then "trigger" these definitions (i.e. reprocess everything that depends on them).

Now comes a tricky bit: we merge the new AST to the AST corresponding to the previous revision of the file by copying the contents of new AST nodes over to the corresponding old nodes (when they exist). This way references to the AST nodes in other modules will continue to point to the correct things.

Following triggered fine-grained dependencies

Mypy daemon uses fine-grained dependencies to find other parts of the codebase that need to be reprocessed in response to the triggered definitions.

To reprocess a triggered definition, the daemon first "strips" (or resets) the relevant AST nodes to match a fresh AST we get from the parser. We use this hack, since we don't have an incremental parser and want to avoid processing the entire file containing a triggered definition.

Reprocessing a definition implies performing semantic analysis and type checking on a stripped subset of some module AST.

After reprocessing, we again check if something externally visible has changed, and we may need to also trigger the dependencies of definitions we just reprocessed.

Completing an incremental step

Eventually we'll reach a fixed point where there are no additional triggered dependencies and we are done.

Relevant code

  • mypy/dmypy_server.py: Implementation of the daemon process
  • mypy/server/deps.py: Generate fine-grained dependencies for AST nodes; also documents how triggers and dependencies work
  • mypy/server/update.py: Fine-grained incremental processing logic
  • mypy/server/astdiff.py: Compare two versions of a module symbol table and find changed definitions
  • mypy/server/astmerge.py: Merge new version of an AST to the previous version to preserve object identities of nodes
  • mypy/server/aststrip.py: Strip an AST to make it "fresh", i.e. similar to what is produced by parser, with all changes performed by semantic analysis or type checking reverted

Test cases

Many daemon test cases perform multiple incremental steps to validate that propagating changes works as expected.

The primary test files are here: test-data/unit/fine-grained*.test

There are also more unit test style test cases for some operations:

  • Generating deps: test-data/unit/deps.test
  • Performing AST diffs: test-data/unit/diff.test
  • Merging ASTs: test-data/unit/merge.test