Skip to content

Mypyc Intermediate Representation (IR)

Jukka Lehtosalo edited this page Oct 24, 2022 · 2 revisions

Mypyc first type checks the code to be compiled using mypy, which produces a mypy AST. The mypy AST isn't very well suited for compilation, so one of the first things mypyc does is to convert it into an intermediate representation (IR) better suited for doing transforms, data flow analyses, code generation, etc.

Overview

Each function gets translated into one or more basic blocks. Each basic block has some ops (operations). Ops are simple, designed to be easy to compile into C or transform to other low-level representations.

For example, consider this function:

def f(x: int) -> int:
    return x + 1

This will be translated to a single basic block which we can pretty-print like this:

def f(x):
    x, r0 :: int
L0:
    r0 = CPyTagged_Add(x, 2)
    return r0

The basic block has the label L0. There are two values in the basic blocks, x and r0 (x is a register, and r0 is a temporary holding the result of an op). There are two ops in the basic block -- a call to a primitive function CPyTagged_Add that adds two integers, and a return op.

Note that integers are represented using a tagged pointer. If the integer value is small, the value is shifted left by one and represented directly. If the value is large, the lower bit is set and the integer is represented by a pointer to a Python int object. This is why the second argument to CPyTagged_Add is 2 instead of 1.

Values

Each op can take some values as operands, and it produces a value. There are three categories of value in IR:

  1. Registers are mutable values, and these are used to represent locals and function arguments. They are also used for some temporaries.
  2. The result of an op is also a value. These are always initialized exactly once and cannot be mutated. This resembles the SSA form, but we don't use Phi functions.
  3. Integer literals are also values.

Branching

Most function do some branching and have multiple basic blocks. Here is a simple function that has an if statement:

def g(b: bool) -> int:
    if b:
        return 1
    return 2

The generated IR looks like this:

def g(b):
    b :: bool
L0:
    if b goto L1 else goto L2 :: bool
L1:
    return 2
L2:
    return 4

The IR has three basic blocks, labeled L0, L1 and L2.

Complex expressions

Each IR op only performs a single, simple operation. More complex operations are represented as multiple ops. For example, consider this function:

def f(x: int, y: int) -> int:
    return x + y + 1

The expression (represented as a tree in the mypy AST) is turned into a sequence of ops:

def f(x, y):
    x, y, r0, r1 :: int
L0:
    r0 = CPyTagged_Add(x, y)
    r1 = CPyTagged_Add(r0, 2)
    return r1

Implementation

The mypyc IR is defined in mypyc.ir. It covers several key concepts that are essential to understand by all mypyc contributors:

  • mypyc.ir.ops.Op is an Abstract Base Class for all IR ops. These are low-level and generally map to simple fragments of C each. Mypy expressions are translated to linear sequences of these ops.

  • mypyc.ir.ops.BasicBlock is a container of a sequence of ops with a branch/goto/return at the end, and no branch/goto/return ops in the middle. Each function is compiled to a bunch of basic blocks.

  • mypyc.ir.rtypes.RType and its subclasses are the types used for everything in the IR. These are lower-level and simpler than mypy or PEP 484 types. For example, there are no general-purpose generic types types here. Each List[X] type (for any X) is represented by a single list type, for example.

  • Primitive types are special RTypes of which mypyc has some special understanding, and there are typically some specialized ops. Examples include int (referred to as int_rprimitive in the code) and list (list_rprimitive). Python types for which there is no specific RType type will be represented by the catch-all object_rprimitive type.

  • Instances of compiled classes are generally represented using the RInstance type. Classes are compiled to C extension classes and contain vtables for fast method calls and fast attribute access.

  • IR representations of functions and classes live in mypyc.ir.func_ir and mypyc.ir.class_ir, respectively.

Look at the docstrings and comments in mypyc.ir for additional information. See the test cases in mypyc/test-data/irbuild-basic.test for more examples of what the IR looks like in a pretty-printed form.