Avoiding de-optimization points due to `Py_DECREF` and allocation. #402

markshannon · 2022-06-01T12:04:13Z

markshannon
Jun 1, 2022
Collaborator

When optimizing a region of code, we want two things:

A more efficient representational state http://psyco.sourceforge.net/theory_psyco.pdf
To maintain knowledge of the VM state, primarily type information, to avoid redundant operations.

However, every time we hit a potential call into C code, we need to restore the VM state and throw away all our information.
This means that any Py_DECREF or allocation forces expensive de-optimization, as it can potentially call arbitrary code.
This is bad, so how can we fix this?

Py_DECREF

We have a couple of options to prevent Py_DECREF causing excessive de-optimization.

Move all decrefs in optimized code to a single de-optimization point. This is complex and requires a fairly low-level IR.
Make _Py_Dealloc safe, by deferring finalizers and untrusted deallocator functions.

Making `_Py_Dealloc` safe.

We can classify objects into three groups:

Objects that can be deallocated by simply freeing the associated memory, e.g. ints, floats, strings (with a bit of complexity about interning)
Objects that can deallocated by recursively Py_DECREF the references to some other objects and then freeing the memory.
The remaining objects that have complex deallocation or finalizers.

By adding a flag (or two) to the type (or object if we get saturating reference counts), we can handle cases 1 and 2 in _Py_Dealloc and push the remaining objects to a list to be deallocated later.

We need to de-optimize whenever we check for interrupts and the like, so _Py_Dealloc can indicate that there are objects that need complex deallocation by setting a bit in the eval_breakervariable ensuring reasonable prompt de-allocation without hurting optimization.

Allocation

This can be handled much like _Py_Dealloc. Instead of calling the cycle GC in the allocator, we set a bit in the eval_breaker variable and call the cycle GC when safe to do so.

gvanrossum · 2022-06-02T00:04:54Z

gvanrossum
Jun 2, 2022
Maintainer

The allocator solution would seem simpler than the DECREF solution, so maybe someone could experiment with that independently?

Regarding DECREF case (2), the recursive DECREFs would themselves end up being classified into these same three groups, right? For lists and tuples this would require a loop. Normally this is done by the type's tp_dealloc, so this feels complicated.

Another thought: I guess sometimes specialization for a given type tells us which category a DECREF falls. Other times we'd have to look in the type. The former seems more attractive to experiment with (simpler).

0 replies

carljm · 2022-06-02T12:53:41Z

carljm
Jun 2, 2022

Move all decrefs in optimized code to a single de-optimization point. This is complex and requires a fairly low-level IR.

It's not that complex, but it does require an IR that makes increfs and decrefs visible. FWIW this is the Cinder JIT approach: our HIR (high-level IR) is roughly similar to Python bytecode but slightly lower level. Notably, increfs and decrefs are explicit in HIR, (though we actually insert them automatically in an analysis pass.) Thus we could very easily move all decref operations to the end of a jit-compiled function, and we've considered doing so but so far haven't due to the potential compatibility impact.

0 replies

mattip · 2022-06-02T13:46:24Z

mattip
Jun 2, 2022

Chiming in to provide a PyPy perspective on the potential compatibility impact of moving object collection. We have fixed many of the problems of libraries doing too much in __del__, and successfully pass tests on some popular scientific python libraries with c-extensions. We still are having problems with these, but I think the Python world has come around to the view that doing anything but freeing memory in a destructor is a code smell.

0 replies

gvanrossum · 2022-06-02T17:29:19Z

gvanrossum
Jun 2, 2022
Maintainer

Don't objects that wrap a file descriptor necessarily need to close that file descriptor in their destructor? (Ditto for network or database connections etc.)

0 replies

mattip · 2022-06-02T20:37:17Z

mattip
Jun 2, 2022

Of course resource closing in a destructor is still required as a last resort defense to prevent resource leaks. However the trend towards using context managers and providing a close() method to release resources is, in my experience, quite pervasive. I gave a couple of counter-examples to show that the paradigm cannot yet be applied everywhere, but I think that is due to a lack of time to fix the few counter-examples, and not because of a fundamental desire to continue to use resource closing in the destructor as a programming technique.

All this was in an attempt to discuss the potential compatibility impact of "moving all decref operations to the end of a jit-compiled function".

0 replies

markshannon · 2022-09-27T18:14:35Z

markshannon
Sep 27, 2022
Collaborator Author

Here's an outline plan for avoiding running arbitrary code in Py_DECREF().

We need a new flag for tp_flags, SAFE_DEALLOC. In TypeReady(), if a class does not have SAFE_DEALLOC set, we replace the tp_dealloc function with a custom one that defers the deallocation.
Then we can set SAFE_DEALLOC for most builtin classes that already safely de-allocate.

While we're changing the interface, we might as well change the whole thing to accept the interpreter as an argument.

void Py_DECREF2(PyInterpreter *interp, PyObject *obj)
{
    if (--obj->ob_refcnt == 0) {
         Py_TYPE(obj)->tp_dealloc2(interp, obj);
    }
}

void safe_dealloc_wrapper(PyInterpreter *interp, PyObject *obj)
{
     PyList_Append(interp->pending_unsafe_dealloc, obj);
     _Py_SetPendingFinalizer(interp); // Sets bit in the eval breaker.
}

For classes that need finalizers, but have safe deallocation functions we need a slightly different function.

void dealloc_maybe_finalize(PyInterpreter *interp, PyObject *obj)
{
    if (NEEDS_FINALIZING(obj)) {
        PyList_Append(interp->pending_finalizer_list, obj);
        _Py_SetPendingFinalizer(interp); // Sets bit in the eval breaker.
        return;
    }
    /* Do the deallocation here */
}

Passing the interpreter to the dealloc function will allow it to efficiently access the relevant freelist, so has no extra cost even for contexts where it is not available.

void Py_DECREF(PyObject *obj)
{
    if (--obj->ob_refcnt == 0) {
         Py_TYPE(obj)->tp_dealloc2(_PyInterpreterState_GET(), obj);
    }
}

Py_DECREF2 also maps nicely onto the HPy API, HPy_Close(HPyContext *ctx, HPy obj)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoiding de-optimization points due to `Py_DECREF` and allocation. #402

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Avoiding de-optimization points due to Py_DECREF and allocation. #402

markshannon Jun 1, 2022 Collaborator

Py_DECREF

Making _Py_Dealloc safe.

Allocation

Replies: 6 comments

gvanrossum Jun 2, 2022 Maintainer

carljm Jun 2, 2022

mattip Jun 2, 2022

gvanrossum Jun 2, 2022 Maintainer

mattip Jun 2, 2022

markshannon Sep 27, 2022 Collaborator Author

Avoiding de-optimization points due to `Py_DECREF` and allocation. #402

markshannon
Jun 1, 2022
Collaborator

Making `_Py_Dealloc` safe.

gvanrossum
Jun 2, 2022
Maintainer

carljm
Jun 2, 2022

mattip
Jun 2, 2022

gvanrossum
Jun 2, 2022
Maintainer

mattip
Jun 2, 2022

markshannon
Sep 27, 2022
Collaborator Author