Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check name when getting PyCapsule pointer #3585

Closed
wants to merge 2 commits into from

Conversation

wjones127
Copy link
Contributor

Fixes #3584

Comment on lines -126 to +128
/// The `name` should match the path to the module attribute exactly in the form
/// of `"module.attribute"`, which should be the same as the name within the capsule.
/// If this capsule represents a module attribute, the `name` should match the path
/// to the module attribute exactly in the form of `"module.attribute"`, which should
/// be the same as the name within the capsule.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt this paragraph was oddly prescriptive, given there are other use cases for PyCapsules than module attributes. (For example, we use them in Arrow to transport Array in the Arrow PyCapsule Interface.) If you feel otherwise, I can revert this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes from the documentation for PyCapsule_Import

Based on your adjustment, you wanted to clarify that capsules can be used for things other than module attributes. This is true, though this function only succeeds for module attributes as far as I am aware?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you are saying. I was looking at this because it's the only API where you can pass a name and have that validated. So instead, I think we want to call GetPointer, right?

Copy link

codspeed-hq bot commented Nov 18, 2023

CodSpeed Performance Report

Merging #3585 will improve performances by 20.28%

Comparing wjones127:3584-capsule-import-mut (c28da26) with main (d89c388)

Summary

⚡ 1 improvements
✅ 77 untouched benchmarks

Benchmarks breakdown

Benchmark main wjones127:3584-capsule-import-mut Change
list_via_extract 329.4 ns 273.9 ns +20.28%

@wjones127 wjones127 marked this pull request as ready for review November 18, 2023 05:18
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Handing out &mut references to global data makes me a little uneasy. It feels like it would be extremely easy to accidentally break the safety invariant of no concurrent accesses, especially as the protections of the GIL are weakening in the near future.

Can you provide examples of your use cases for &mut in a bit more detail? My gut reaction is that it would be desirable to continue to only offer the shared reference API (which is itself unsafe) and instead encourage users to use interior mutability with appropriate synchronisation mechanisms.

Comment on lines -126 to +128
/// The `name` should match the path to the module attribute exactly in the form
/// of `"module.attribute"`, which should be the same as the name within the capsule.
/// If this capsule represents a module attribute, the `name` should match the path
/// to the module attribute exactly in the form of `"module.attribute"`, which should
/// be the same as the name within the capsule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes from the documentation for PyCapsule_Import

Based on your adjustment, you wanted to clarify that capsules can be used for things other than module attributes. This is true, though this function only succeeds for module attributes as far as I am aware?

@wjones127
Copy link
Contributor Author

wjones127 commented Nov 18, 2023

Handing out &mut references to global data makes me a little uneasy. It feels like it would be extremely easy to accidentally break the safety invariant of no concurrent accesses, especially as the protections of the GIL are weakening in the near future.

It's in an unsafe unsafe method, that's for the responsibility of the user. My use case is short-lived local data, not global data. They are capsules created to export array data so they can be imported by another module.

Can you provide examples of your use cases for &mut in a bit more detail? My gut reaction is that it would be desirable to continue to only offer the shared reference API (which is itself unsafe) and instead encourage users to use interior mutability with appropriate synchronisation mechanisms.

My use case is implementing the Arrow PyCapsule Interface. Consumers of there PyCapsules need to move the contents of the capsule to take ownership, which requires mutable access. This was already implemented using PyCapsule::pointer(), but we had to implement our own capsule name validation: https://github.com/kylebarron/arrow-rs/blob/2e42926248bf0e1b558ca91936208c53f81a2a19/arrow/src/pyarrow.rs#L121-L138

I think I was confusing PyCapsule_Import with PyCapsule_GetPointer, and what I really want is a wrapper for the latter that does the name checking. If I rewrote the PR to implement that, does that sounds acceptable?

@davidhewitt
Copy link
Member

Right, I understand what you're looking for now.

I see the following code which I'm taking to be an example of the pattern you use at the moment:

validate_pycapsule(array_capsule, "arrow_array")?;
let array_ptr = array_capsule.pointer() as *mut FFI_ArrowArray;
let array = unsafe { std::ptr::replace(array_ptr, FFI_ArrowArray::empty()) };

If we just swapped .pointer() for some new mut_reference() method, then I guess it looks like this:

validate_pycapsule(array_capsule, "arrow_array")?;
let array_mut_ref = unsafe { array_capsule.mut_reference::<FFI_ArrowArray>() };
let array = std::mem::replace(array_mut_ref, FFI_ArrowArray::empty());

So the unsafe gets moved from consumption of the pointer to creation of the reference, and std::ptr::replace gets swapped for std::mem::replace. I'd argue this is potentially more dangerous because array_mut_ref holds stronger invariants for longer. While I can see that you're unlikely to misuse &mut FFI_ArrowArray in this pattern, I still mostly want to discourage users in general from doing this.


I think I was confusing PyCapsule_Import with PyCapsule_GetPointer, and what I really want is a wrapper for the latter that does the name checking. If I rewrote the PR to implement that, does that sounds acceptable?

If we pull back on the &mut T requirement and go literally for "hand out a pointer and check the name", I guess we can reduce your code to something like:

let array = unsafe { std::ptr::replace(array_capsule.pointer_checked("arrow_array").cast(), FFI_ArrowArray::empty()) };

Is that what you were thinking too? I don't really love the name "pointer_checked", so I'm open to better names. Could just be called get_pointer() to match the Python C API. (See #1980 (comment) which is where we decided not to use get_pointer() for our current API.)

Also the last time I decided to not check the name originates from here - #2485 (comment)

@adamreichold
Copy link
Member

I would like to add my concern that this API is just too unsafe for my taste, i.e. validating the global invariants required to produce a mutable reference is too error prone to encourage doing it via a dedicated method.

Concerning the name checking. Maybe it would be better to have method that turns PyAny (e.g. an attribute value) into &PyCapsule and checks the name along the way? Something like

PyCapsule::from_object(obj: &PyAny, name: &str) -> PyResult<&PyCapsule>;

@wjones127
Copy link
Contributor Author

wjones127 commented Nov 18, 2023

I hear you both on the safety concerns. Having something that returns a *mut void (or some pointer) is fine with me. It's the lack of name checking that bothers me at the moment. In my context, it's critical for safety that I check the it. 😄

(I'm really scratching my head as to why all the bindings libraries--PyO3, pybind11, and nanobind--just pass in the capsules name instead of taking a user-passed name for PyCapsule_GetPointer. The designers of the Python C API gave us a nice safety check and all the binding libraries just throw it out the window? What am I missing?)

Is that what you were thinking too?

Yeah, exactly. I wonder if this API would make sense:

/// Get the pointer or return an error 
///
/// Returns an error if the names do not match or the pointer is null.
unsafe fn pointer_checked<T>(&self, name: &str) -> PyResult<NonNull<T>> { ... }

I like it because:

  1. It puts the target type name and the capsule name on the same line: capsule.pointer_checked::<FFIArrowArray>("arrow_array"). That way it's really clear that we did indeed check the name before casting to the corresponding type.
  2. It also does the null check for you.

Then my use case becomes nice and short:

let array_ptr = unsafe { capsule.pointer_checked::<FFIArrowArray>("arrow_array")? } ;
let array = unsafe { std::ptr::replace(array_ptr, FFI_ArrowArray::empty()) };

What do you think of that?

@adamreichold
Copy link
Member

I'm really scratching my head as to why all the bindings libraries--PyO3, pybind11, and nanobind--just pass in the capsules name instead of taking a user-passed name for PyCapsule_GetPointer. The designers of the Python C API gave us a nice safety check and all the binding libraries just throw it out the window? What am I missing?

I think the idea with PyCapsule was that you would never create an instance not matching the intended name, i.e. it would bind the name to the current instance automatically so that you just cannot create an invalid PyCapsule in the first place.

Of course, this does not fully work out due to the FromPyObject impl (which you are also using in your code) because it allows you to manually fetch the attribute and turn it into a PyCapsule, the name of which has not been checked in contrast to using e.g. PyCapsule::import.

Hence, my suggestion to go for PyCapsule::from_object to perform the check as early as possible to try to turn this into a PyCapsule invariant again. I do we think we would actually have to remove the FromPyObject impl for this to really work though but this is impl is implied just becasue PyCapsule maps to a native CPython type. (But that native CPython does not share the I-always-had-my-name-checked-during-initialization part.)

So in summary, I think our API tried hard to imbue better invariants into the type but falls short where it must directly interact with the native types.

@wjones127
Copy link
Contributor Author

wjones127 commented Nov 18, 2023

I think the idea with PyCapsule was that you would never create an instance not matching the intended name, i.e. it would bind the name to the current instance automatically so that you just cannot create an invalid PyCapsule in the first place.

I see, so instead of thinking of PyCapsule as a wrapper around a void* pointer, they think of it as already having been checked to the be expected type. Makes me wonder if we want a strongly-typed version:

let capsule = TypedPyCapsule::<FFIArrowArray>::from_object(obj, "arrow_array")?;
let pointer: NonNull<FFIArrowArray> = capsule.get_pointer()?;

One thing I like about the existing PyCapsule::import() API is that the name and the output type are in the same call.

@davidhewitt
Copy link
Member

I have definitely wondered in the past about a TypedPyCapsule or ValidatedPyCapsule API. I still think it'd be unsafe to do the type cast, because you have no guarantee that the name really means that the data is of the right type. Aside from that, I'm open to exploring that design further.

@adamreichold
Copy link
Member

The typed version brings with it a necessarily unsafe type assertion as David points out. Hence, I think a plain PyCapsule::from_object which is like FromPyObject plus the name checking would be a good immediate fix for the issue from the OP while staying close to the initial aim for the untyped capsule API.

@wjones127
Copy link
Contributor Author

The typed version brings with it a necessarily unsafe type assertion as David points out.

Right, of course. That would make it:

let capsule = unsafe { TypedPyCapsule::<FFIArrowArray>::from_object(obj, "arrow_array")? };
let pointer: NonNull<FFIArrowArray> = capsule.get_pointer()?;

Hence, I think a plain PyCapsule::from_object which is like FromPyObject plus the name checking would be a good immediate fix for the issue from the OP while staying close to the initial aim for the untyped capsule API.

At this point, I feel like it just doesn't bring that much more safety if it's going to return an untyped PyCapsule object. I think what I'll do is implement the TypedPyCapsule approach as a wrapper utility in my downstream projects. Once I do that, if you like what you see there, we can consider upstreaming that into PyO3.

@wjones127 wjones127 closed this Nov 18, 2023
@wjones127 wjones127 changed the title feat: add PyCapsule::import_mut() feat: check name when getting PyCapsule pointer Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add method to check PyCapsule name while getting pointer
3 participants