Skip to content

Mypyc Object Representation

Jukka Lehtosalo edited this page Oct 24, 2022 · 2 revisions

Mypyc uses a tagged pointer representation for values of type int (CPyTagged), char for booleans, and C structs for tuples. For most other objects mypyc uses the CPython PyObject *.

Tagged Integers

Python integers that fit in 31/63 bits (depending on whether we are on a 32-bit or 64-bit platform) are represented as C integers (CPyTagged) shifted left by 1. Integers that don't fit in this representation are represented as pointers to a PyObject * (this is always a Python int object) with the least significant bit set.

Tagged integers have an arbitrary precision. By using a tagged pointer representation, common operations are pretty quick and don't require using heap-allocated objects.

Tagged integer operations are defined in mypyc/lib-rt/int_ops.c and mypyc/lib-rt/CPy.h.

Native Integers

There are also native, fixed-width integer types, such as int32 (see mypyc.ir.rtypes), that don't use the tagged representation. These types are not yet exposed to users, but they are used in generated code. (Exposing these to users is work in progress.)

Error Values

If an exception is raised in a function or in a primitive operation, we normally represent it through an error value. Each type has some error value, which is normally chosen so that it isn't a valid real value:

  • For tagged integers, the error value is 1.
  • For bool, the error value is 2 (False and True are represented as 0 and 1, respectively).
  • For any value of type PyObject *, the error value is 0 (null pointer).

Some types can't have a reserved error value, such as int32. For these, we use an overlapping error value. Errors are always signaled using the error value, but it could also represent a valid value. We must call PyErr_Occurred() to double check if the value is really an error. The error value is chosen so that it comes up rarely. This way we can mostly avoid calling PyErr_Occurred() needlessly.

If we call a function, we must usually check for an error value in the generated C. Example pseudo-C:

r0 = myfunction();
if (r0 == ERROR) {
    <add traceback entry>;
    return ERROR;  // propagate error to caller
}

Mypyc has a pass that inserts error value checks automatically. When generating IR, it's normally just necessary to describe how errors are reported for each op that could fail. Operations have the error_kind attribute for this purpose. Typical values include ERR_MAGIC (use type-specific error value to signal an error) and ERR_NEVER (the op can never fail, or it aborts the process on error).