Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribution opportunity: improved conversion traits #4041

Open
davidhewitt opened this issue Apr 3, 2024 · 10 comments
Open

Contribution opportunity: improved conversion traits #4041

davidhewitt opened this issue Apr 3, 2024 · 10 comments

Comments

@davidhewitt
Copy link
Member

Following the introduction of the Bound API in 0.21, we have converged on three primary traits which are used to convert Rust datatypes to Python.

  • one trait for from-Python conversions: FromPyObject
  • two traits for to-Python conversions: ToPyObject and IntoPy<PyObject> (and sub-forms IntoPy<Py<PyTuple>> and IntoPy<Py<PyString>>)

There are several opportunities to contribute to PyO3 with a goal of improving these traits. Given that these traits are so central to PyO3's API, they impact developer ergonomics and performance. The right designs could improve PyO3 a lot, though any new API would need a migration pathway which is feasible for users.

Improving FromPyObject

FromPyObject is already in a migration state as its input argument is changing from a GIL Ref to a Bound smart pointer. We also have FromPyObjectBound which allows a few additional optimisations and extractions like &str at cost of complexity. We think the two lifetimes of FromPyObjectBound will be the future design of FromPyObject, though we don't yet know how that will interact with #[derive(FromPyObject)].

Beyond resolving that question, I am aware of at least two other possibilities to refine FromPyObject:

  • Rather than returning PyErr as the error type, we could do something similar to std::str::FromStr and add a type Err associated type. This would allow implementations to use just downcasting errors, or other concrete Rust errors without having to go through the relatively heavy PyErr machinery.
  • In Review conversions and ensure consistency #3226 and similar issues we have discussed how there is a possibility to define either "strict" or "lax" conversions. Maybe FromPyObject could have an extract_exact method added, which defines strict conversions, so that the #[pyfunction] macros can have a #[pyo3(exact)] annotation on arguments to control this better.

Improving to-Python traits

I'm reasonably convinced that there doesn't need to be two (or four, depending how you see it) to-Python traits for PyO3.

In addition, we have a strong need to have a fallible conversion, because there are now several panics inside PyO3 which are caused by the limitation that none of these traits can fail.

I once proposed an IntoPyObject trait in #2316. If FromPyObject is staying around, I think IntoPyObject is the right name to pair with it (though maybe we just introduce a new trait for each direction, for easier migration). I think the design in #2316 is outdated and also it probably should be fallible, as FromPyObject is also fallible.

Maybe the following is closer to the right definition:

trait IntoPyObject {
    type Target;
    type Err;

    fn into_pyobject(self, py: Python<'py>) -> Result<Bound<'py, Self::Target>, Self::Err>;
}

... but that's just a sketch, and discussion / research / prototyping would yield valuable insight.

@Icxolu
Copy link
Contributor

Icxolu commented Apr 4, 2024

One point to consider is whether IntoPyObject should take Self by value or by reference. I believe typically in such a conversion one would take it by value to potentially reuse allocations, which I think we can't do because the conversion target lives on the Python heap, not the Rust one. So maybe taking &self is more sensible, such that Rust structures do not get needlessly consumed (and dropped). Or differently: Why should I not be allowed to continue to use a Vec<T> after I turned it into a Bound<PyList>?

I guess one downside would be the identity conversion of a Python type to itself (or another python type?), which would have to perform a (relatively cheap) reference count increase (which would most likely be happening in generic code).

@adamreichold
Copy link
Member

adamreichold commented Apr 5, 2024

though we don't yet know how that will interact with #[derive(FromPyObject)].

Note #[serde(borrow)] is prior art on the issue, i.e. we could implement FromPyObject<'a, 'py> for &'a T by default and use a lifetime generic parameter if #[pyo3(borrow)] is given.

@adamreichold
Copy link
Member

One point to consider is whether IntoPyObject should take Self by value or by reference.

As with e.g. IntoIterator, I think the trait should take self (by value) and we would implement it for for references &'a T where applicable.

@adamreichold
Copy link
Member

Maybe the following is closer to the right definition:

Any particular reason to fix the target via the associated type? Why not keep this as a generic IntoPyObject<T> to support "multiple overloads"?

@Icxolu
Copy link
Contributor

Icxolu commented Apr 5, 2024

As with e.g. IntoIterator, I think the trait should take self (by value) and we would implement it for for references &'a T where applicable.

That makes sense, I agree with that. (I always forget that you can implement traits on references as well 🤦 )

Any particular reason to fix the target via the associated type? Why not keep this as a generic IntoPyObject<T> to support "multiple overloads"?

I think it could be nice to support this. For example a HashMap could be turned into both a PyDict or a PyList of tuples. In any case I think the return type should be Bound<T> (and not just T, like IntoPy currently has), such that we actually know to convert into a Python type.

@davidhewitt
Copy link
Member Author

davidhewitt commented Apr 6, 2024

As with e.g. IntoIterator, I think the trait should take self (by value) and we would implement it for for references &'a T where applicable.

This is what I was thinking too. I think &T: SomeBound might be more generic than we want, but I imagine &Bound<T>, &[T] and maybe even &Vec<T> are all options to explore.

Any particular reason to fix the target via the associated type? Why not keep this as a generic IntoPyObject<T> to support "multiple overloads"?

I think it could be nice to support this. For example a HashMap could be turned into both a PyDict or a PyList of tuples. In any case I think the return type should be Bound<T> (and not just T, like IntoPy currently has), such that we actually know to convert into a Python type.

This is an interesting idea, and it would probably make migration easier than trying to refactor to solve existing IntoPy<Py<PyTuple>> and IntoPy<Py<PyString>> cases . I'm not entirely sure how I feel about it, worth exploring for sure though.

One case related to "generic overloads" worth mentioning is (). Its current implementation of IntoPy<PyObject> produces None on the Python side, but for IntoPy<Py<PyTuple>> it produces an empty Python tuple. This is done for because -> () function return values map to Python -> None, but my feeling is that it'd be better to try to specialize that case away inside the #[pyfunction] internals and have the trait implementations both produce an empty tuple.

@davidhewitt
Copy link
Member Author

A reason against the generic overloads is type inference. Assuming the function is into_pyobject(), then if we had an associated type .into_pyobject() always is unambiguous. A generic trait (like we currently have for IntoPy) means that anything which implements more than one IntoPyObject<T> will need type hints.

Examples to consider is e.g. Vec<T>. This becomes a Python list at the moment, so I imagined the associated type would be PyList. In the generic case, we might instead want to have IntoPyObject<PyAny> and IntoPyObject<PyList> implemented.

I'm assuming that all types usable for #[pyfunction] return values will need IntoPyObject<PyAny> implemented.

Currently for Vec<T>, it only implements IntoPy<PyObject>. To get to Bound<PyList> you then need to do a downcast (optionally unchecked & unsafe).

@Icxolu
Copy link
Contributor

Icxolu commented Apr 6, 2024

A reason against the generic overloads is type inference.

True, and type hints for generic traits are always a bit awkward, because one can't use a turbo fish on the method (and thus it often interrupts method chains).

One case related to "generic overloads" worth mentioning is (). Its current implementation of IntoPy<PyObject> produces None on the Python side, but for IntoPy<Py<PyTuple>> it produces an empty Python tuple.

Hmm, I can see how this can be surprising. I guess this is a general problem of the generic case. If there is more than one implementation of IntoPyObject<T>, what should IntoPyObject<PyAny> return?

I'm assuming that all types usable for #[pyfunction] return values will need IntoPyObject<PyAny> implemented.

This might not be necessarily needed, because into_any() is available unconditionally for all Bound<T>. I think this would make it possible to convert during macro expansion.

@davidhewitt
Copy link
Member Author

This might not be necessarily needed, because into_any() is available unconditionally for all Bound<T>. I think this would make it possible to convert during macro expansion.

Potentially, yes. My worry is that if a type has two implementations of IntoPyObject<T>, then the macros will also have type inference problems. Assuming that they always use IntoPyObject<PyAny> seems like the way we'd resolve this.

@adamreichold
Copy link
Member

True, and type hints for generic traits are always a bit awkward, because one can't use a turbo fish on the method (and thus it often interrupts method chains).

Couldn't we provide an extension trait to make the turbo fish available, e.g.

trait IntoPyObjectExt {
  fn into_pyobject<T, E>(&self) -> Result<Bound<'py, T>, E> where Self: IntoPyObject<Target=T, Err=E>;
}

impl<T> IntoPyObjectExt for T {
  fn into_pyobject<T, E>(&self) -> Result<Bound<'py, T>, E> where Self: IntoPyObject<Target=T, Err=E> {
    <self as &IntoPyObject>::into_pyobject_impl(self)
  }
}

That one probably does not even compile, but I think some way of shuffling the generic parameters around should be available?

Other than that, I don't think this particularly worse than e.g. FromIterator or From/Into from std. OTOH, I am highly sceptical that we will get by with a single trait if the resulting type is fixed by an associated type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants