Skip to content
Jukka Lehtosalo edited this page Jun 29, 2022 · 8 revisions

The mypy semantic analyzer binds names to definitions, builds symbol tables, and performs various simple consistency checks. It takes an abstract syntax tree as input and modifies it.

The semantic analyzer is implemented in several modules (mypy/semanal*.py). mypy/semanal.py does the bulk of work, while analysis of types is performed in mypy/typeanal.py.

The semantic analyzer uses the visitor pattern.

The semantic analysis has several passes. Some checks depend on previous passes having populated certain information. mypy/semanal_main.py puts it all together and invokes the different passes.

A simple example

The comments here indicate which name bindings are added during semantic analysis:

# mod.py
x: int = 1  # Bindings: int -> builtins.int, x -> mod.x

def f(x):
    return x + 1  # Bindings: x -> local x

print(f(x))  # Bindings: print -> builtins.print, f -> mod.f, x -> mod.x

The symbol table (namespace) for module mod will look like this (simplified presentation):

{
  "x": Var(...), 
  "f": FuncDef(...),
}

In reality the symbol table values are instances of mypy.nodes.SymbolTableNode that wrap references to the Var, etc. nodes.

{
  "x": SymbolTableNode(Var(...)), 
  "f": SymbolTableNode(FuncDef(...)),
}

Name binding and scopes

The semantic analyzer binds references to local variables and global (module-level) definitions. Attribute references (of form x.y) are generally only bound during type checking, since binding them requires static type information. As an exception, attribute references via module objects are bound during semantic analysis.

The top level (outside any function or class definitions) of each module is a single (name) scope. The body of each class is also a single scope (not including any methods or nested classes). Each function is also a single scope (again nested scopes are separate). This means that compound statements do not introduce new scopes. This is similar to Python.

Each name by default can only have a single binding within a single scope. Multiple nested scopes can, of course, have different definitions for a single name.

Symbol tables

The SemanticAnalyzer class maintains a per-module symbol table (a mypy.nodes.SymbolTable instance in the globals attribute). After semantic analysis it is stored as the names attribute of the relevant mypy.nodes.MypyFile class.

Symbol tables for local scopes (i.e. functions) are stored in the locals attribute of SemanticAnalyzer.

Similarly, we keep track of the symbol tables of classes.

Modules

The parser can process a single file without knowledge of any other modules. The semantic analyzer also needs to have access to the symbol tables of imported modules.

The semantic analyzer doesn't need the full ASTs of imported modules. In incremental mode we only need to deserialize the symbol tables of unmodified files from a previous mypy run. This speeds up processing significantly.

Forward references

To resolve forward references, the semantic analyzer may perform multiple passes over the AST nodes. Initial passes use PlaceholderNode instances if the target is not yet known. These will be replaced in later passes with the real target definitions.

The next pass

After semantic analyzer we perform the Type Checker pass.