Unexpected memory management behavior #1056

etianen · 2020-07-20T10:49:16Z

I'm very excited about this library. The effort going into documentation and ergonomics is fantastic. I've been waiting to use it on stable Rust for months, and have finally started trying to integrate it into my CPU-bound Python codebase to write extension modules.

Right now, pyo3 seems to have a serious issue to me. The use of the object storage (even the lighter #887 implementation) means that this library is not zero-cost, and worse, can easily leak memory. The additional management layer that's built on top of Python's own reference counting, while ergonomic, is a big problem for my use-case.

The current situation, as I understand it:

GilGuard acquire overhead (minimal, and rare):
- Acquire a Mutex on the PyObject alloc/dealloc array.
- Allocate a Vec to copy the PyObject alloc/dealloc array out of TLS.
- Write to 2x thread-local storage.
- Allocate two Vec for storing pointers to owned references.
&PyAny owned reference creation (mostly method returns, cheap, but very frequent):
- Write to thread-local storage.
- Occasionally re-alloate owned reference Vecs as they grow.
- Memory is only freed at GilGuard drop
PyObject drop (cheap, but frequent):
- Read from thread-local storage.
- Possibly acquire a Mutex on the dealloc array.
- Possibly only free memory at next GitGuard acquire

The regular access to TLS and global mutexes are regrettable, but the memory-freeing behavior is really worrying to me. I'm not writing Python extensions because I want pretty-damn fast and unbounded memory growth. I'm going to the effort of writing a Python extension because I need as fast as possible and predictable memory usage.

The solution outlined in #885 seems perfect. A PyAny<'a> has no need for object storage, and Clone and Drop work as one would expect. I'd even go one stage further and have PyObject and Py<T> always acquire the GIL for Clone and Drop. Sure, it's slower than using the current TLS and deferred drop, but it's simpler, more predictable, and people can always use ManuallyDrop turn them back into PyAny<'a> to batch clone/drop if performance is suffering.

You could even remove the mirrored object API on PyObject and require that they're converted into PyAny<'a> to call or getattr, making PyObject's only purpose to be storing a long-term reference to a Python object between GIL acquisitions.

At that point, there's no additional global state beyond what's already provided by the Python interpreter. Things are deallocated predictably. There's a smaller API surface. The library is truly zero-cost and adds no additional complexity.

I would urge you to be bold with making breaking changes while the library is young and the userbase relatively small. It would be a trememdous shame if the most popular Rust/Python bindings were significantly slower than the equivalent C, and introduced an unexpected additional memory model.

In any case, thanks for all the work you've put in so far. :)

The text was updated successfully, but these errors were encountered:

kngwyu · 2020-07-20T11:55:16Z

Thanks. I think it's good timing to revisit our object model now, since recently some safety problems are pointed out aroung GILGuard.
Let me create the 2nd draft.

etianen · 2020-07-20T12:06:15Z

I'd love to read the 2nd draft. I'll shortly be using this project cautiously in production, so am excited to see where it's heading.

davidhewitt · 2020-07-20T22:15:23Z

Just some notes I learned from my first try at PyAny<'a>:

The biggest issue is about API of Py<T> - e.g. in Py<PyAny<'static>> - the 'static lifetime is both required and the only one that makes sense, which is a serious papercut.
I had a go at solving the above using an "object type tag" e.g. Any, List which was just a type master that could go in Py<T>. This caused lots of traits to end up picking up the 'a lifetime, which is kinda annoying. Generic Associated Types, when finished, would really help here.
It is definitely faster than the current PyO3 implementation, but not by as much as you might think (maybe 10% last time we tried). Reliability of memory release is a nice win though.
Because ownership is in the struct rather than the GILPool, lifetimes work very slightly differently compared to current PyO3 (they're usually a bit shorter with PyAny<'a>).

davidhewitt · 2020-07-20T22:20:38Z

I also have been thinking about our new API .into_ref() that we're planning to add to Py<T> and PyObject.

Because this gives an easy way to go from the owned reference into the gil-scoped reference, I'm also considering whether we should add new ways to create Python types with Py<T>. This would allow users to opt-out (or potentially opt-in) to GIL-scoped references depending on their preference for speed or ergonomics.

Focussing on PyList, I see two options:

Add PyList::new_py() -> Py<PyList>, leave PyList::new() unchanged. This breaks less code.
Change PyList::new() -> Py<PyList>, add PyList::new_ref() -> &PyList. The nice thing about this option is that it names really well alongside Py::into_ref and Py::as_ref.

davidhewitt · 2020-07-20T22:25:32Z

Finally, I wanted to talk about PyObject and Py<T> drop behaviour. I think these have a nice sweet spot at the moment, though I'm open to changing them. There are a number of places in PyO3 where automatically acquiring the GIL could improve ergonomics. It's hard to know where to draw the line on hidden performance costs vs. ease of use.

I think we can always make these tradeoffs better learning from community members who have feedback!

etianen · 2020-07-21T10:59:55Z

To me, the imporant things to get right for this library are, in order:

Sound.
Predictable memory usage, with no unbounded growth.
Zero-cost (fast as C)
Ergonomic, well documented

Currently, 1 and 4 are pretty well nailed, but 2 and 3 are both being held back by object storage. Here's my attempt at a solution that does away with all thread locals and object storage.

The first part of the solution is already solved by the current PyAny (zero-cost wrapper around a pointer) and PyDict, PyList (zero-cost wrapper around a pointer that asserts a native type). The change I'd make is that a user never interacts with &PyAny or &PyDict directly.

Instead, these are always wrapped by either:

PyRef<'py, T> (e.g. PyRef<'py, PyAny> or PyRef<'py, PyDict>
- The 'py lifetime is tied to Python<'py>, so guarantees that the GIL is held.
- Clone and Drop perform immediate refcount incr/decr.
- Deref impl to T for easy .call(...) and getattr(...) etc.
PyOwned<T> (e.g. PyOwned<PyAny> or PyOwned<PyDict>)
- These assume that the GIL is not held.
- Clone and Drop acquire the GIL and perform immediate refcount incr/decr, then drop the GIL again. This is not efficient, but it keeps the memory model predictable and correct. People can use .into_ref(...) (below) for efficient batch clone and drop.
- An .into_ref(self, py: Python<'py>) method consumes the PyOwned<T> and returns a PyRef<'py, T>.
- An .as_ref(&self, py: Python<'py>) method does not consume the PyOwned<T>, and returns a &'py T (the only time a user ever deals with a &'py T directly. I'm not 100% sure about the lifetimes here, or whether it would actually work. Maybe .clone().into_ref(py) is more correct, at the expese of a refcount.

I believe this does away with all thread locals and all object storage, and brings the performance and memory characteristics in line with the equivalent C.

I'd be happy to have a go at this myself, but I don't want to step on other people's toes, or spend time on something that isn't wanted, or has already been tried and found problematic.

etianen · 2020-07-21T11:02:59Z

Finally, I wanted to talk about PyObject and Py<T> drop behaviour. I think these have a nice sweet spot at the moment, though I'm open to changing them. There are a number of places in PyO3 where automatically acquiring the GIL could improve ergonomics. It's hard to know where to draw the line on hidden performance costs vs. ease of use.

I think we can always make these tradeoffs better learning from community members who have feedback!

My specific issue with the drop behaviour is that checking a thread-local is not zero cost, and if the GIL isn't held it leads to unbounded memory growth, and potentially a permanent memory leak if the GIL is never re-acquired.

davidhewitt · 2020-07-21T20:24:01Z

@etianen thanks for all the ideas. I agree that the direction I want for PyO3 is to remove as much object storage as possible.

TLDR; Yes, I like this design and have considered it myself, but I'm not sure it's the right time to do it.

I've thought many times about ideas similar to your PyOwned<T> / PyRef<'a, T> combination. Note that your proposal for PyOwned<T> is more or less exactly what the current Py<T> is, except for the drop semantics. (PyObject is very similar to Py<PyAny>, and I'd like to remove it, but haven't quite found the appropriate way to do it given how PyO3's conversion traits work currently.)

Regarding PyRef<'a, T> - again, this is an API that I've considered moving towards. It fits nicely as PyRef is already a name we use for #[pyclass], and the meaning would be essentially the same for native types (except "refcell" semantics wouldn't be needed for native types).

Why I'm not sure now is the moment.

The biggest downside of PyRef<'py, T> with current Rust, is that the GIL lifetime 'py isn't perfectly preserved over multiple calls. Consider, for example, this method definition which we would adopt if we switched to PyRef<'py, T>:

impl PyAny {
    fn call0(&'a self) -> PyRef<'a, PyAny> { ... }
}

If PyRef<'py, PyAny> derefs to &'a PyAny for the method call, then this lifetime 'a is not the original GIL lifetime 'py. Instead, 'a is the lifetime of the derefencing borrow. So as far as Rust is concerned the result of the method call is borrowed from the original receiver. This is obviously wrong!

The result of this is that a series of chained PyO3 API calls gradually see the GIL lifetime which the compiler guarantees get narrower and narrower. It's not the end of the world, but it's quite an annoying papercut.

The future solution I see

There is, fortunately, a way that I can see this being resolved in the future: arbitrary self types. If we were able to use &PyRef<'py, T> as a self type, then this problem goes away:

impl PyAny {
    fn call0<'py>(self: &PyRef<'py, Self>) -> PyRef<'py, PyAny> { ... }
}

With arbitrary self types, we really can have our cake and eat it! So this is a direction which I would like to see for PyO3, but I'm not actively pursuing it now because we need someone to champion getting the feature added to the Rust language first!

davidhewitt · 2020-07-21T20:31:34Z

Note that a "hack" to fix GIL lifetimes if we move to PyRef<'py, T> without arbitrary self types is to expose an API to manually rebind the reference using an API that consumes the PyRef:

impl<T> PyRef<'_, T> {
    fn with_lifetime<'py>(self, py: Python<'py>) -> PyRef<'py, T> {
        // build new pyref using new Python token
    }
}

In the case where the reference is owned, we know that the lifetime 'a can be extended safely to any lifetime the GIL is held for, which is why the above API could work as an interim "hack".

TBH I'd be very happy to listen to the PyO3 community to see whether they would prefer migrate to this API now, at the cost of some lifetime frustration, or to wait for the more ergonomic future.

davidhewitt · 2020-07-21T21:13:59Z

My specific issue with the drop behaviour is that checking a thread-local is not zero cost, and if the GIL isn't held it leads to unbounded memory growth, and potentially a permanent memory leak if the GIL is never re-acquired.

The alternative (acquiring the GIL) is significantly slower than checking the "not zero cost" thread-local. (You can easily verify this by modifying the Drop impl for PyObject and running the bench_pyobject benchmark.)

If we were to remove the PyObject storage in favour of decreasing the reference immediately, I'd still encourage us to check the thread-local - and then only acquire the GIL if needed.

To migrate to the case where PyObject does not have object storage on Drop, I would like to see discussion on these points:

Is there any risk of deadlock by doing this? (My suspicion is no; any reassurance is welcome.)
A quick test on my laptop suggests that the "worst case" where the GIL is not held and needs to be acquired is about 40x slower than the implementation we currently have (place in object storage). Is this overhead in a Drop impl acceptable?

To test this, I modified bench_pyobject to:
```
#[bench]
fn drop_many_objects(b: &mut Bencher) {
    let gil = Python::acquire_gil();
    let py = gil.python();
    let objects = vec![py.None(); 1000];
    b.iter(|| {
        let objects = objects.clone();
        py.allow_threads(move || drop(objects));
    });
}
```
And changed the PyObject drop to always acquire the GIL then decrease the reference.

When I've looked at this in the past, I've come to the conclusion that the PyObject object storage is actually not so bad, when faced with the alternative of a fairly nasty performance drawback in the Drop implementation. To hit the case of "unbounded memory growth" with this bit of the object storage, you've need to be repeatedly cloning and dropping these PyObject instances without ever acquiring the GIL. Perhaps somebody will one day write a valid program which does this, but that seems rare!

(OTOH, I 100% agree with you that the unbounded memory growth that can occur from &PyAny with the GIL is a serious issue.)

etianen · 2020-07-22T09:56:30Z

It's clear that you've considered a lot of these issues already, so thanks for explaining! On reflection, I can see that there's still a cause for a naked &'py PyAny. The holy trinity of object references is:

&'py T - Borrowed pointer with GIL.
PyRef<'py, T> - Owned pointer with GIL.
PyOwned<T> - Owned pointer with no GIL (yes, effectively a Py<T>).

I think your with_lifetime() solution is workable. The alternative is to duplicate the object API (getattr, call etc) on PyRef<'py, T>. That starts getting extremely clunky though, particularly with other primitive types like PyDict. I think your with_lifetime is a good, general solution.

Why I think now is the time

It's currently impossible / very difficult to write a Python extension that deals with siginificant numbers of objects, and that is a big chunk of valuable use-cases. Using &'py T means unbounded memory growth, or requires unsafe GILPool. Using Py<T> or PyObject means thread-local overhead, which can be mitigated using Python::release(...) absolutely everywhere.

Arbitrary self-types might never land, or might land in a long time. Switching to PyRef<'py, T> is a big breaking API change, but it enables a large swath of use-cases. I vote for doing it soon! If we get arbitrary self-types in the future, the with_lifetime hack can be deprecated, but it would be a very easy migration path.

It's a small hit to ergonomics vs a 10% performance boost and no unbounded memory growth. That seems a clear win, IMO.

The drop behavior of `Py<T>` and `PyObject`

With a decent PyRef<'py, T> implementation, and if pyclass constructors could return PyRef<'py, T> instead of Py<T>, the use of Py<T> and PyObject would only be used for persisting owned pointers without the GIL. In most cases, this would mean very few of them actually exist. So their Clone and Drop behavior isn't so important. I can think of a few strategies, all with different tradeoffs:

Clone/Drop strategy	Pro	Con
Always acquire GIL, immediately update refcount	Simple, no additional state	Slow
Check for GIL using `PyGILState_Check` (or a TLS if faster), acquire GIL if needed, immediately update refcount	Still simple, no additional state, hopefully fast is the GIL is held	Slow if GIL not held
Check for GIL using `PyGILState_Check` (or a TLS if faster), immediately update refcount if possible, else defer until GIL next held	Fast regardless of GIL state	Unbounded memory growth

Whichever strategy is chosen, a programmer can use ManuallyDrop and into_ref() to perform a zero-cost drop of one or more Py<T>, so this really comes down to which strategy is the default for correctness and good-enough lazy programming. I personally find the potential for unbounded memory growth a little worrying, but I appreciate the different tradeoffs here. I'd vote for the second strategy (assuming reasonably performant), but it's not a hill I'd die on. 😛

etianen · 2020-07-22T10:04:01Z

I've largely said my piece here, but just to add: I'm very reassured that you see the &PyAny unbounded memory growth as a serious issue, and it's clear that you've considered all sorts of approaches. Thanks for your time in explaining everything.

I can't use this library in earnest until that issue is solved, but it remains an excellent project. I'm happy to help out if it makes this happen earlier, but otherwise I'll be watching the changelog eagerly!

kngwyu · 2020-07-26T09:48:53Z

Thank you but I'm thinking about a completely different one: PyAny<'py> and PyAnyRaw. No borrowed type.

davidhewitt · 2020-07-26T11:00:46Z

@kngwyu would love to see some detailed notes on what you're thinking!

davidhewitt · 2020-08-06T16:32:24Z

Just FYI; while I think we're getting close to considering 0.12 release, and shouldn't wait for this (unless someone's already close to an implementation), I'd like to see us review how we can improve this for 0.13.

davidhewitt · 2021-12-15T08:17:20Z

Note to self - pantsbuild/pants#13526 (comment) is interesting. (Potentially Arc<NonNull<ffi::PyObject>> may be a good future representation.)

Adding a special case of memory management when writing an extension. This is a documentation of: PyO3#1056 and PyO3#2853

2864: Add a section on memory management for `extension` r=davidhewitt a=haixuanTao Adding a special case of memory management when writing an extension. This is a documentation of: #1056 and #2853 Co-authored-by: Haixuan Xavier Tao <tao.xavier@outlook.com> Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>

davidhewitt · 2024-03-15T08:18:58Z

With the release of the Bound API in 0.21 (beta), there is finally a resolution to this problem. 🎉

Migrating to the Bound API will solve this problem immediately for code which needs this fixed, and the follow-up to this work is #3960 to remove the existing GIL Refs API.

etianen · 2024-03-15T09:03:14Z

Incredible, I've been watching and hoping for a long while for this to be sorted! I'm really glad it was included for 1.0. ❤️

kngwyu self-assigned this Jul 20, 2020

This comment has been minimized.

Sign in to view

davidhewitt mentioned this issue Jul 23, 2020

PyObject -> &'py PyAny #1041

Closed

kngwyu mentioned this issue Aug 1, 2020

Make PyObject a type alias of Py<PyAny> (& remove FromPy) #1063

Merged

4 tasks

davidhewitt added the needs-design label Aug 6, 2020

davidhewitt added this to the 0.13 milestone Aug 6, 2020

davidhewitt mentioned this issue Sep 11, 2020

Remove special case of PyIterator #1176

Merged

BenSchZA mentioned this issue Oct 12, 2020

PyO3 memory leak CADLabs/radCAD#2

Closed

davidhewitt mentioned this issue Nov 29, 2020

experimental: owned objects API #1300

Closed

davidhewitt modified the milestones: 0.13, 0.14 Dec 22, 2020

davidhewitt mentioned this issue Jan 24, 2021

OsStr and Path conversions #1379

Merged

davidhewitt mentioned this issue Apr 13, 2021

Support all receiver types for all protocol methods #1206

Closed

davidhewitt modified the milestones: 0.14, 0.15 Apr 13, 2021

davidhewitt mentioned this issue Nov 19, 2021

Add Py<T>.setattr method #2009

Merged

3 tasks

davidhewitt modified the milestones: 0.17, 0.18 Jul 3, 2022

Narsil mentioned this issue Oct 21, 2022

Less GIL locks speeds things up apparently ? huggingface/safetensors#31

Merged

davidhewitt modified the milestones: 0.18, 0.19 Dec 27, 2022

davidhewitt mentioned this issue Jan 4, 2023

Memory leak on pyo3 object created within Rust #2853

Closed

haixuanTao added a commit to haixuanTao/pyo3 that referenced this issue Jan 7, 2023

Add a section on memory management for extension

59825f9

Adding a special case of memory management when writing an extension. This is a documentation of: PyO3#1056 and PyO3#2853

haixuanTao mentioned this issue Jan 7, 2023

Add a section on memory management for extension #2864

Merged

davidhewitt pushed a commit to haixuanTao/pyo3 that referenced this issue Jan 10, 2023

Add a section on memory management for extension

8af9d2a

Adding a special case of memory management when writing an extension. This is a documentation of: PyO3#1056 and PyO3#2853

davidhewitt mentioned this issue May 18, 2023

0.19 Release #3154

Closed

7 tasks

davidhewitt mentioned this issue Jun 4, 2023

add pyo3::experimental namespace with new owned PyAny<'py> #3205

Closed

messense mentioned this issue Jun 13, 2023

python: op.write() seems have memory leak apache/opendal#2455

Closed

davidhewitt modified the milestones: 0.19, 0.20, 0.21 Jun 16, 2023

davidhewitt mentioned this issue Jul 14, 2023

Bikeshedding names: Py and PyObject #873

Open

davidhewitt mentioned this issue Aug 11, 2023

Upcoming changes to PyO3's API in 0.21 #3382

Closed

This was referenced Sep 25, 2023

OOM when loading large JSON files in v2 pydantic/pydantic-core#985

Open

Add benchmark for nested/wide model using definitions pydantic/pydantic-core#997

Merged

okhaliavka mentioned this issue Oct 16, 2023

Lingering performance issues after dealing with a very complex model pydantic/pydantic-core#1026

Open

davidhewitt closed this as completed Mar 15, 2024

1.0 automation moved this from To Do to Done Mar 15, 2024

davidhewitt removed the needs-design label Mar 15, 2024

phil-opp mentioned this issue Mar 15, 2024

Use new Bound API of pyO3 dora-rs/dora#437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected memory management behavior #1056

Unexpected memory management behavior #1056

etianen commented Jul 20, 2020

kngwyu commented Jul 20, 2020

etianen commented Jul 20, 2020

davidhewitt commented Jul 20, 2020

davidhewitt commented Jul 20, 2020 •

edited

davidhewitt commented Jul 20, 2020

etianen commented Jul 21, 2020

etianen commented Jul 21, 2020

This comment has been minimized.

davidhewitt commented Jul 21, 2020 •

edited

davidhewitt commented Jul 21, 2020 •

edited

davidhewitt commented Jul 21, 2020

etianen commented Jul 22, 2020

etianen commented Jul 22, 2020

kngwyu commented Jul 26, 2020

davidhewitt commented Jul 26, 2020

davidhewitt commented Aug 6, 2020

davidhewitt commented Dec 15, 2021

davidhewitt commented Mar 15, 2024

etianen commented Mar 15, 2024

Unexpected memory management behavior #1056

Unexpected memory management behavior #1056

Comments

etianen commented Jul 20, 2020

kngwyu commented Jul 20, 2020

etianen commented Jul 20, 2020

davidhewitt commented Jul 20, 2020

davidhewitt commented Jul 20, 2020 • edited

davidhewitt commented Jul 20, 2020

etianen commented Jul 21, 2020

etianen commented Jul 21, 2020

This comment has been minimized.

davidhewitt commented Jul 21, 2020 • edited

TLDR; Yes, I like this design and have considered it myself, but I'm not sure it's the right time to do it.

Why I'm not sure now is the moment.

The future solution I see

davidhewitt commented Jul 21, 2020 • edited

davidhewitt commented Jul 21, 2020

etianen commented Jul 22, 2020

Why I think now is the time

The drop behavior of Py<T> and PyObject

etianen commented Jul 22, 2020

kngwyu commented Jul 26, 2020

davidhewitt commented Jul 26, 2020

davidhewitt commented Aug 6, 2020

davidhewitt commented Dec 15, 2021

davidhewitt commented Mar 15, 2024

etianen commented Mar 15, 2024

davidhewitt commented Jul 20, 2020 •

edited

davidhewitt commented Jul 21, 2020 •

edited

davidhewitt commented Jul 21, 2020 •

edited

The drop behavior of `Py<T>` and `PyObject`