Skip to content

Reference Counting Transform

Jukka Lehtosalo edited this page Oct 24, 2022 · 1 revision

Python uses reference counting to free (garbage collect) objects when they are no longer needed. When writing C extensions manually, maintaining reference counts is one of the most error-prone tasks. Mypyc abstracts this away by automatically inserting the necessary operations to increment/decrement reference counts. These are implemented through the IncRef and DecRef ops, respectively.

When generating IR in mypyc.irbuild, no reference count manipulation operations are explicitly added. After the initial IR has been generated, the transform implemented in mypyc.transform.refcount adds these operations.

These are the primary things the transform does (but there are many special cases beyond these):

  • If a new reference to a value is needed, for example when assigning a value to another register, add an IncRef op.
  • If the lifetime of a value ends, add a DecRef op.

Example

Consider this code:

def f(x: str) -> str:
    y = x
    g(y)

This is the initial IR (without reference count manipulation):

def f(x):
    x, y, r0 :: str
L0:
    y = x
    r0 = g(y)
    return 1

The reference counting transform will produce this output (exception handling is omitted for brevity):

def f(x):
    x, y, r0 :: str
L0:
    inc_ref x
    y = x
    r0 = g(y)
    dec_ref y
    dec_ref r0
    return 1

Note that the generated IR is not optimal -- strictly speaking, we don't need to have inc_ref x and dec_ref y. This illustrates that adding reference count manipulations optimally is non-trivial. We discuss some optimizations mypyc performs below.

You can find more examples in the test cases, in mypyc/test-data/refcount.test.

Borrowing Values

A key principle of reference counting is that if you create another reference to an object, you need to increment the reference count. However, if we only hold the reference briefly and we can be certain that nobody can decrease the reference count during that period to free the object, it's possible to borrow a reference, i.e. take a reference without incrementing/decrementing the reference count.

Mypyc borrows values in some cases. It knows about certain operations that won't free (arbitrary) objects, and avoids redundant reference count manipulations across these operations. A typical example is the expression x.y.z, if y and z are native attributes. We can borrow x.y, since reading the attribute z can't cause x.y to be freed. Thus we can generate this IR:

def f(x):
    x :: __main__.C
    r0 :: __main__.D
    r1 :: int
L0:
    r0 = borrow x.y
    r1 = r0.z
    return r1

Note that the first attribute access looks like this:

r0 = borrow x.y

The borrow prefix means that this reads the attribute x.y without increasing the reference count. The second access r0.z increases the reference count. This is necessary since we will return the value, and we can't predict what the caller will do with the reference, so no borrowing is possible.