Skip to content

Introduction to Mypyc for Contributors

Jukka Lehtosalo edited this page Oct 24, 2022 · 3 revisions

This is a short introduction aimed at anybody who is interested in contributing to mypyc, or anybody who is curious to understand how mypyc works internally.

Key Differences from Python

Code compiled using mypyc is often much faster than CPython since it does these things differently:

  • Mypyc generates C that is compiled to native code, instead of compiling to interpreted byte code, which CPython uses. Interpreted byte code always has some interpreter overhead, which slows things down.

  • Mypyc doesn't let you arbitrarily monkey patch classes and functions in compiled modules. This allows early binding -- mypyc statically binds calls to compiled functions, instead of going through a namespace dictionary. Mypyc can also call methods of compiled classes using vtables, which are more efficient than dictionary lookups used by CPython.

  • Mypyc compiles classes to C extension classes, which are generally more efficient than normal Python classes. They use an efficient, fixed memory representation (essentially a C struct). This lets us use direct memory access instead of (typically) two hash table lookups to access an attribute.

  • As a result of early binding, compiled code can use C calls to call compiled functions. Keyword arguments can be translated to positional arguments during compilation. Thus most calls to native functions and methods directly map to simple C calls. CPython calls are quite expensive, since mapping of keyword arguments, *args, and so on has to mostly happen at runtime.

  • Compiled code has runtime type checks to ensure that runtimes types match the declared static types. Compiled code can thus make assumptions about the types of expressions, resulting in both faster and smaller code, since many runtime type checks performed by the CPython interpreter can be omitted.

  • Compiled code can often use unboxed (not heap allocated) representations for integers, booleans and tuples.

Supported Python Features

Mypyc supports a large subset of Python. Note that if you try to compile something that is not supported, you may not always get a very good error message.

Here are some major things that aren't yet supported in compiled code:

  • Some dunder methods don't work, though most of them are supported
  • Monkey patching compiled functions or classes
  • General multiple inheritance (a limited form is supported)
  • Async generators
  • The match statement

We are generally happy to accept contributions that implement new Python features.

Development Environment

First you should set up the mypy development environment as described in the mypy docs. macOS, Linux and Windows are supported.

Compiling and Running Programs

When working on a mypyc feature or a fix, you'll often need to run compiled code. For example, you may want to do interactive testing or to run benchmarks. This is also handy if you want to inspect the generated C code (see Inspecting Generated C).

Run mypyc to compile a module to a C extension using your development version of mypyc:

$ mypyc program.py

This will generate a C extension for program in the current working directory. For example, on a Linux system the generated file may be called program.cpython-37m-x86_64-linux-gnu.so.

Since C extensions can't be run as programs, use python3 -c to run the compiled module as a program:

$ python3 -c "import program"

Note that __name__ in program.py will now be program, not __main__!

You can manually delete the C extension to get back to an interpreted version (this example works on Linux):

$ rm program.*.so

Another option is to invoke mypyc through tests (see Testing below).

Useful Background Information

Beyond the mypy documentation, here are some things that are helpful to know for mypyc contributors:

Some Important Limitations

All of these limitations will likely be fixed in the future:

  • We don't detect stack overflows in compiled code.

  • We don't handle Ctrl-C in compiled code.