New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add feature controlling the global reference pool to enable avoiding its overhead. #4095
base: main
Are you sure you want to change the base?
Conversation
43d791e
to
fcc7253
Compare
CodSpeed Performance ReportMerging #4095 will degrade performances by 17.12%Comparing Summary
Benchmarks breakdown
|
Why print vs. panic in drop? |
It is quite hard to avoid dropping without the GIL, especially in embedding situations. And in contrast to |
That said, I would also be fine with panicking in |
I quite like the idea of this feature. I wonder, maybe instead of leaking, without the pool enabled the drop & clone code can just unconditionally call That way the program is correct whether or not the feature is enabled, but users get to control this knob according to the performance characteristics of their program. |
(and hopefully with nogil in the long term the reference pool feature might just then become deprecated for better options) |
Isn't this much too prone to deadlocks? |
Ungh, yes that's true. With nogil what I suggest would be viable. 😭 |
Could we provide the |
The problem I see is that you might want to still clone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's extremely handy to be able to #[derive(Clone)]
in particular, which would require Py<T>: Clone
.
Also the direction seems to be that panic-on-drop is not allowed (e.g. rust-lang/rfcs#3288) so +1 for not panicking. But I would be open to aborting instead of printing :)
Pending a decision on what to do with Drop I think the idea here gets a 👍 from me as a lever for users to pull when they know their application better than us (i.e. it never clones or drops Python values without the GIL held).
We should document this in the features and performance sections of the guide before merging, though.
POOL.register_incref(obj); | ||
#[cfg(not(feature = "reference-pool"))] | ||
panic!("Cannot clone pointer into Python heap without the GIL being held."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of #4098, should we add #[track_caller]
here (and on the Py::clone
impl, assuming that it's valid to add to trait methods)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears possible, c.f. https://rustc-dev-guide.rust-lang.org/backend/implicit-caller-location.html#traits and hence I will add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also add it to the Drop
path because the backtrace reporting is just as relevant for the abort case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear I'm not entirely sure aborting is necessary in Drop, as we already have precedent to leak e.g. unsendable objects.
(I guess a further knob users may want a choice on is whether to leak or abort, but it's less clear that's a good fit for a feature.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think leaking is better: It is easier to fix a memory leak than an abort and it is more consistent with the handling of unsendable types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, my concern with printing + leaking is that it's very difficult to catch in testing. If I want to ensure my app never takes these paths, it's much easier if I get a clear test failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that but I think there is no right choice. Not even adding another knob seems just right, as the additional complexity seems over the top.
In the end, for me the consistency with how we already handle unsendable types was what tipped the odds towards leaking for me, because I think the problems are very much the same, even similarly niche, and if go for aborting here, we should do that for unsendable types as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case for testing is an interesting one, though.
What if we had an environment variable that upgraded leaking to abort (or vice versa) on debug builds for this case? I think the runtime overhead might be acceptable as part of a debugging setup.
I think for the unsendable case, aborting is not desirable at all because it's difficult to prevent the GC running on other threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider adding a --cfg
which could be enabled via RUSTFLAGS
etc. but personally, I find it too much complexity.
I mean, we all like performance, but for projects valuing correctness (e.g. cryptography), my recommendation would be to not keep doing what we have been doing until now. This is for projects which want to live on the razor's edge.
And in the long term, the problem will go away or rather move to a different project. A GILless CPython will carry the machinery to update reference counts from threads all over the place and we should be able to just unconditionally drop our reference pool for these builds.
fcc7253
to
67d8a8d
Compare
Added a section in the performance chapter of the guide. @inducer @matthiasdiener As this was originally reported by you, what do you think about the feature and its explanation in the guide? |
d7dbf11
to
698b312
Compare
Hmm, I can definitely see that point, but in that case I feel like having this as opt-out of "safe" behavior, rather than opt-in to "i-know-what-im-doing" is not ideal, since this has "spooky" side effects. Starting with |
I share that sentiment which is why I included
in the cover letter. Do note one downside of this though: Rust's features are additive so there is a different kind of spooky side effect. One crate in dependency graph of an extension can enable affecting everything else. It would be bad style to enable it in a non-leaf crate, but it is possible. So if we really want to make this hard to enable accidentally as possible, we should probably go for a raw But I think we should first hear from the original reporters whether they actually want this. Just dropping the PR because it does not really help them is also an option. |
Following up from #4105 (comment) It seems that we may need to migrate to a strategy where For Maybe we can move the overhead away from all PyO3 functions, and instead if there is a deferred drop we signal up a worker thread to wake, which can attempt to acquire the GIL and drop all currently pending deferred drops? Moving the overheads off every function call may be a good enough solution until GIL-less Python can give us an alternative option. |
(I know I already suggested this reference-counting-thread idea in #3827 (comment) and @adamreichold correctly pointed out that deferring the work is dangerous. With what we are now seeing particularly about deferred |
But isn't this mainly gaming the benchmarks? Our call overhead will reduce, but the actual work to be done will increase (at least there is an additional acquisition of the GIL, and on another thread to which all these objects may be cache-cold). This could improve throughput at the cost of increase CPU usage, but only if the other threads actually release the GIL for reasonable amounts of time. And if that worker thread never actually gets the GIL, we could see unbounded memory growth. While deferred drop is certainly not as dangerous a deferred clone, I fear we are inventing a garbage collector here. And if do decide to do that, I think we should go for established solutions, e.g. epoch-based reclamation or hazard pointers which at least have some support crates available. Personally, I don't think we should make this work. This is adding completely additional semantics that Python itself does not (yet) support which will always be difficult to get right as long as Python does not cooperate in these memory management tasks. So while I could live without the |
Very true. Ok, I hereby agree to drop (heh) the reference-counting-thread idea 👍 I think there's still some open question on whether we should be trying to keep the drop pool around. I feel like for sake of compatibility we can't migrate users immediately from the existing pool to leaking on drop without GIL, as silently adding memory leaks to their program may break confidence in PyO3. The abort-on-drop is similarly unappealing for migration but is at least a noisy failure mode. Whether to leak / abort, some prior art is that Given the concern about testing for leaks being hard, I'd be tempted to suggest that we add this as a I think we are now on a course to release 0.22 ASAP, given that there are various behavioural changes afoot here. So I propose the following for 0.22:
|
698b312
to
f9d8ff3
Compare
I added a new commit here which removes delayed refcnt increments and adds said feature.
Added another commit which renames the feature proposed here.
I would be glad if we did. Note that the tests currently pass only with |
f288495
to
29f89de
Compare
👍 I will do my best to find time to give this a full review tomorrow. Am pretty loaded with family responsibility at the moment so my opportunities to focus are a bit rare this month. |
992e295
to
6d46508
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these features should be mentioned in the Features reference
section of the guide, probably with a strong emphasis on both to only enable them in "leaf" crates (and not libraries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks solid to me, thanks! I ache a little bit for the necessary loss of the convenience of the Py::clone
implementation.
I think the CloneRef
trait and related question of how to rework #[pyo3(get)]
is going to be worth exploring ASAP.
Do we have metrics showing the performance differences between this and the current implementation? I'd be uncomfortable with this if there was not a noticeable performance benefit.
I would very much prefer this over feature flags. |
Removing deferred clones/reference count increments is necessary for correctness and The other feature to disable the reference pool is motivated by the CPU profiles from #4085 where it is the single largest contributor which we could do away with. The use case is a bit extreme, but especially for the extension use case the reference pool is mainly an additional convenience users can do without. But of course, it makes usage more difficult so it is opt-in. |
I have pushed a new version which should address the review comments so far. I also added a separate commit which turns the So assuming no further issues are found, before this can be merged we need to reach consensus on two open questions as far as I can see:
|
c56282a
to
13b7b48
Compare
…viour becomes opt-in
…nce count errors as long as these are available
…ble-reference-pool features
… conditional compilation flag Such a flag is harder to use and thereby also harder to abuse. This seems appropriate as this is purely a performance-oriented change which show only be enabled by leaf crates and brings with it additional highly implicit sources of process aborts.
13b7b48
to
822e4ae
Compare
Any thoughts on two remaining design questions? Otherwise, this should be good to go after conflict resolution. |
I don't think leaking references in
Drop
is nice because it is quite easy for this to be called without the GIL being held, especially in embedding scenarios. But it should be safe nevertheless. It might call for an inverted feature to explicitly opt out of the reference pool by enabling e.g.disable-reference-pool
.