Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add feature controlling the global reference pool to enable avoiding its overhead. #4095

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

adamreichold
Copy link
Member

I don't think leaking references in Drop is nice because it is quite easy for this to be called without the GIL being held, especially in embedding scenarios. But it should be safe nevertheless. It might call for an inverted feature to explicitly opt out of the reference pool by enabling e.g. disable-reference-pool.

Copy link

codspeed-hq bot commented Apr 18, 2024

CodSpeed Performance Report

Merging #4095 will degrade performances by 17.12%

Comparing opt-ref-pool (13b7b48) with main (6fb972b)

Summary

❌ 1 regressions
✅ 68 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main opt-ref-pool Change
sequence_from_tuple 268.9 ns 324.4 ns -17.12%

@alex
Copy link
Contributor

alex commented Apr 18, 2024

Why print vs. panic in drop?

@adamreichold
Copy link
Member Author

Why print vs. panic in drop?

It is quite hard to avoid dropping without the GIL, especially in embedding situations. And in contrast to Clone, leaking is sound.

@adamreichold
Copy link
Member Author

That said, I would also be fine with panicking in Drop and put the onus of avoiding it on the extension author as this is already a bit of a "I-don't-care-just-make-it-fast" knob.

@davidhewitt
Copy link
Member

I quite like the idea of this feature. I wonder, maybe instead of leaking, without the pool enabled the drop & clone code can just unconditionally call Python::with_gil?

That way the program is correct whether or not the feature is enabled, but users get to control this knob according to the performance characteristics of their program.

@davidhewitt
Copy link
Member

(and hopefully with nogil in the long term the reference pool feature might just then become deprecated for better options)

@adamreichold
Copy link
Member Author

without the pool enabled the drop & clone code can just unconditionally call Python::with_gil?

Isn't this much too prone to deadlocks?

@davidhewitt
Copy link
Member

Ungh, yes that's true. With nogil what I suggest would be viable. 😭

@Icxolu
Copy link
Contributor

Icxolu commented Apr 18, 2024

Could we provide the Clone impl for Py only under that feature flag? That would eliminate the panic and make it a compile error. Cloning would still be possible by going through Bound, with the guarantee that the gil is held.

@adamreichold
Copy link
Member Author

Could we provide the Clone impl for Py only under that feature flag? That would eliminate the panic and make it a compile error. Cloning would still be possible by going through Bound, with the guarantee that the gil is held.

The problem I see is that you might want to still clone Py values to e.g. fulfil trait requirements of Rust types but ensure by context that you have the GIL.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's extremely handy to be able to #[derive(Clone)] in particular, which would require Py<T>: Clone.

Also the direction seems to be that panic-on-drop is not allowed (e.g. rust-lang/rfcs#3288) so +1 for not panicking. But I would be open to aborting instead of printing :)

Pending a decision on what to do with Drop I think the idea here gets a 👍 from me as a lever for users to pull when they know their application better than us (i.e. it never clones or drops Python values without the GIL held).

We should document this in the features and performance sections of the guide before merging, though.

src/gil.rs Outdated Show resolved Hide resolved
POOL.register_incref(obj);
#[cfg(not(feature = "reference-pool"))]
panic!("Cannot clone pointer into Python heap without the GIL being held.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of #4098, should we add #[track_caller] here (and on the Py::clone impl, assuming that it's valid to add to trait methods)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also add it to the Drop path because the backtrace reporting is just as relevant for the abort case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear I'm not entirely sure aborting is necessary in Drop, as we already have precedent to leak e.g. unsendable objects.

(I guess a further knob users may want a choice on is whether to leak or abort, but it's less clear that's a good fit for a feature.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think leaking is better: It is easier to fix a memory leak than an abort and it is more consistent with the handling of unsendable types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, my concern with printing + leaking is that it's very difficult to catch in testing. If I want to ensure my app never takes these paths, it's much easier if I get a clear test failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that but I think there is no right choice. Not even adding another knob seems just right, as the additional complexity seems over the top.

In the end, for me the consistency with how we already handle unsendable types was what tipped the odds towards leaking for me, because I think the problems are very much the same, even similarly niche, and if go for aborting here, we should do that for unsendable types as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case for testing is an interesting one, though.

What if we had an environment variable that upgraded leaking to abort (or vice versa) on debug builds for this case? I think the runtime overhead might be acceptable as part of a debugging setup.

I think for the unsendable case, aborting is not desirable at all because it's difficult to prevent the GC running on other threads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider adding a --cfg which could be enabled via RUSTFLAGS etc. but personally, I find it too much complexity.

I mean, we all like performance, but for projects valuing correctness (e.g. cryptography), my recommendation would be to not keep doing what we have been doing until now. This is for projects which want to live on the razor's edge.

And in the long term, the problem will go away or rather move to a different project. A GILless CPython will carry the machinery to update reference counts from threads all over the place and we should be able to just unconditionally drop our reference pool for these builds.

@adamreichold
Copy link
Member Author

Added a section in the performance chapter of the guide. @inducer @matthiasdiener As this was originally reported by you, what do you think about the feature and its explanation in the guide?

@adamreichold adamreichold force-pushed the opt-ref-pool branch 2 times, most recently from d7dbf11 to 698b312 Compare April 19, 2024 12:20
@Icxolu
Copy link
Contributor

Icxolu commented Apr 19, 2024

The problem I see is that you might want to still clone Py values to e.g. fulfil trait requirements of Rust types but ensure by context that you have the GIL.

Yes, I think it's extremely handy to be able to #[derive(Clone)] in particular, which would require Py<T>: Clone.

Hmm, I can definitely see that point, but in that case I feel like having this as opt-out of "safe" behavior, rather than opt-in to "i-know-what-im-doing" is not ideal, since this has "spooky" side effects. Starting with no-default-features, or turning them off later and forgetting to turn this one back on, will lead to very different behavior, which won't be noticed until runtime. Arguably this is probably not a very common scenario currently, but it feels like an easy footgun nevertheless.

@adamreichold
Copy link
Member Author

Hmm, I can definitely see that point, but in that case I feel like having this as opt-out of "safe" behavior, rather than opt-in to "i-know-what-im-doing" is not ideal, since this has "spooky" side effects. Starting with no-default-features, or turning them off later and forgetting to turn this one back on, will lead to very different behavior, which won't be noticed until runtime. Arguably this is probably not a very common scenario currently, but it feels like an easy footgun nevertheless.

I share that sentiment which is why I included

It might call for an inverted feature to explicitly opt out of the reference pool by enabling e.g. disable-reference-pool.

in the cover letter.

Do note one downside of this though: Rust's features are additive so there is a different kind of spooky side effect. One crate in dependency graph of an extension can enable affecting everything else. It would be bad style to enable it in a non-leaf crate, but it is possible.

So if we really want to make this hard to enable accidentally as possible, we should probably go for a raw --cfg flag enabled by wrangling compiler flags to which I would also be amenable.

But I think we should first hear from the original reporters whether they actually want this. Just dropping the PR because it does not really help them is also an option.

@davidhewitt
Copy link
Member

Following up from #4105 (comment)

It seems that we may need to migrate to a strategy where Clone without the GIL will always panic. We could gate the whole Clone implementation behind a feature, given that the panic is inconvenient and users may not want to deal with the possibility of runtime crashes. (Instead they would likely need to write Clone implementations by hand.)

For Drop - I think that deferring reference count decreases is still safe (worst it can do is cause leaks).

Maybe we can move the overhead away from all PyO3 functions, and instead if there is a deferred drop we signal up a worker thread to wake, which can attempt to acquire the GIL and drop all currently pending deferred drops? Moving the overheads off every function call may be a good enough solution until GIL-less Python can give us an alternative option.

@davidhewitt
Copy link
Member

(I know I already suggested this reference-counting-thread idea in #3827 (comment) and @adamreichold correctly pointed out that deferring the work is dangerous.

With what we are now seeing particularly about deferred Clone being incorrect, my thought is the reference-counting-thread idea may be viable when only used for Drop.)

@adamreichold
Copy link
Member Author

Maybe we can move the overhead away from all PyO3 functions, and instead if there is a deferred drop we signal up a worker thread to wake, which can attempt to acquire the GIL and drop all currently pending deferred drops? Moving the overheads off every function call may be a good enough solution until GIL-less Python can give us an alternative option.

But isn't this mainly gaming the benchmarks? Our call overhead will reduce, but the actual work to be done will increase (at least there is an additional acquisition of the GIL, and on another thread to which all these objects may be cache-cold).

This could improve throughput at the cost of increase CPU usage, but only if the other threads actually release the GIL for reasonable amounts of time. And if that worker thread never actually gets the GIL, we could see unbounded memory growth.

While deferred drop is certainly not as dangerous a deferred clone, I fear we are inventing a garbage collector here. And if do decide to do that, I think we should go for established solutions, e.g. epoch-based reclamation or hazard pointers which at least have some support crates available.

Personally, I don't think we should make this work. This is adding completely additional semantics that Python itself does not (yet) support which will always be difficult to get right as long as Python does not cooperate in these memory management tasks.

So while I could live without the Clone impl, I would like to avoid making the Drop do more than leak or panic/abort. If we are unused about this, then I think adding a feature flag to control whether Clone is unavailable and Drop leaks versus Clone and Drop both panic/abort makes more sense to avoid any surprises. Adding a panicking Clone also sounds like it fits the additive feature model.

@davidhewitt
Copy link
Member

But isn't this mainly gaming the benchmarks? Our call overhead will reduce, but the actual work to be done will increase (at least there is an additional acquisition of the GIL, and on another thread to which all these objects may be cache-cold).

This could improve throughput at the cost of increase CPU usage, but only if the other threads actually release the GIL for reasonable amounts of time. And if that worker thread never actually gets the GIL, we could see unbounded memory growth.

Very true. Ok, I hereby agree to drop (heh) the reference-counting-thread idea 👍

I think there's still some open question on whether we should be trying to keep the drop pool around. I feel like for sake of compatibility we can't migrate users immediately from the existing pool to leaking on drop without GIL, as silently adding memory leaks to their program may break confidence in PyO3. The abort-on-drop is similarly unappealing for migration but is at least a noisy failure mode.

Whether to leak / abort, some prior art is that pybind11 throws exceptions if objects are dropped without the GIL held - https://github.com/pybind/pybind11/blob/19a6b9f4efb569129c878b7f8db09132248fbaa1/include/pybind11/pytypes.h#L269

Given the concern about testing for leaks being hard, I'd be tempted to suggest that we add this as a disable-reference-pool feature which replaces the reference pool with aborts.

I think we are now on a course to release 0.22 ASAP, given that there are various behavioural changes afoot here.

So I propose the following for 0.22:

  • We get the next step of the gil-refs deprecation complete.
  • We gate a panicking Py::clone implementation behind a py-clone feature.
  • We add disable-reference-pool to abort on drop instead of pooling.
  • (Maybe) we also complete the changes to Python::allow_threads

@adamreichold
Copy link
Member Author

adamreichold commented Apr 21, 2024

We gate a panicking Py::clone implementation behind a py-clone feature.

I added a new commit here which removes delayed refcnt increments and adds said feature.

We add disable-reference-pool to abort on drop instead of pooling.

Added another commit which renames the feature proposed here.

(Maybe) we also complete the changes to Python::allow_threads

I would be glad if we did.

Note that the tests currently pass only with py-clone enabled and disable-reference-pool disabled. I will have to work on this, but I thought I push what I have so far to enable review of the more tricky parts of the code and especially the associated documentation.

@adamreichold adamreichold force-pushed the opt-ref-pool branch 2 times, most recently from f288495 to 29f89de Compare April 21, 2024 20:16
@davidhewitt
Copy link
Member

👍 I will do my best to find time to give this a full review tomorrow. Am pretty loaded with family responsibility at the moment so my opportunities to focus are a bit rare this month.

@adamreichold adamreichold force-pushed the opt-ref-pool branch 2 times, most recently from 992e295 to 6d46508 Compare April 21, 2024 20:31
Copy link
Contributor

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these features should be mentioned in the Features reference section of the guide, probably with a strong emphasis on both to only enable them in "leaf" crates (and not libraries)

guide/src/performance.md Outdated Show resolved Hide resolved
guide/src/faq.md Show resolved Hide resolved
src/pybacked.rs Outdated Show resolved Hide resolved
src/instance.rs Outdated Show resolved Hide resolved
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks solid to me, thanks! I ache a little bit for the necessary loss of the convenience of the Py::clone implementation.

I think the CloneRef trait and related question of how to rework #[pyo3(get)] is going to be worth exploring ASAP.

Cargo.toml Outdated Show resolved Hide resolved
guide/src/faq.md Show resolved Hide resolved
guide/src/migration.md Outdated Show resolved Hide resolved
guide/src/performance.md Outdated Show resolved Hide resolved
src/gil.rs Show resolved Hide resolved
@mejrs
Copy link
Member

mejrs commented Apr 23, 2024

Do we have metrics showing the performance differences between this and the current implementation? I'd be uncomfortable with this if there was not a noticeable performance benefit.

So if we really want to make this hard to enable accidentally as possible, we should probably go for a raw --cfg flag enabled by wrangling compiler flags to which I would also be amenable.

I would very much prefer this over feature flags.

@adamreichold
Copy link
Member Author

Do we have metrics showing the performance differences between this and the current implementation? I'd be uncomfortable with this if there was not a noticeable performance benefit.

Removing deferred clones/reference count increments is necessary for correctness and py-clone is additive hiding the now necessarily panicking Clone impl to avoid surprises when people update, i.e. they have to enable the feature if they want to continue to use Clone and at least get a chance to read about the implications.

The other feature to disable the reference pool is motivated by the CPU profiles from #4085 where it is the single largest contributor which we could do away with. The use case is a bit extreme, but especially for the extension use case the reference pool is mainly an additional convenience users can do without. But of course, it makes usage more difficult so it is opt-in.

@adamreichold
Copy link
Member Author

I have pushed a new version which should address the review comments so far.

I also added a separate commit which turns the disable-reference-pool feature into the disable_pyo3_reference_pool conditional compilation flag. I did opt to keep it documented in features.md though.

So assuming no further issues are found, before this can be merged we need to reach consensus on two open questions as far as I can see:

  • Do we want to use a feature or a conditional compilation flag. We did decide on inverted semantics in both cases, i.e. the default is safe and the dangerous but fast path must be explicitly enabled. The feature has significantly better usability, but can be abused in non-leaf crates. The conditional compilation flag is intentionally harder to use (which is the intention for using it at all) and set as a global property of the build by design.
  • Do we want another flag/feature/environment variable to turn aborting the process on dropping without the GIL into leaks? Aborts are better for testing but leaking is always safe even though performance can suffer. So maybe we want to couple this to cfg(debug_assertions) and abort in debug builds but leak in release builds?

@adamreichold
Copy link
Member Author

Any thoughts on two remaining design questions? Otherwise, this should be good to go after conflict resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants