Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark highlighting the costs of failed calls to FromPyObject::extract. #2279

Merged
merged 2 commits into from Apr 10, 2022

Conversation

adamreichold
Copy link
Member

This was brought up in #2278 and probably affects quite a bit of code that does a case-by-case analysis of argument types:

enum_from_pyobject      time:   [644.87 ns 648.64 ns 653.16 ns]                                
list_via_cast_as        time:   [3.4474 ns 3.4668 ns 3.4808 ns]                              
list_via_extract        time:   [4.3715 ns 4.3785 ns 4.3896 ns]                              
not_a_list_via_cast_as  time:   [3.4952 ns 3.5034 ns 3.5094 ns]                                    
not_a_list_via_extract  time:   [284.42 ns 284.79 ns 285.12 ns]                                   
not_a_list_via_extract_enum                                                                            
                        time:   [289.56 ns 289.78 ns 289.98 ns]

The overhead itself mostly comes from this

/// Convert `PyDowncastError` to Python `TypeError`.
impl<'a> std::convert::From<PyDowncastError<'a>> for PyErr {
    fn from(err: PyDowncastError<'_>) -> PyErr {
        exceptions::PyTypeError::new_err(err.to_string())
    }
}

conversion.

I find it especially troubling that the rather nice way of using an enum is also affected.

@adamreichold
Copy link
Member Author

I pushed a commit adding PyDowncastErrorArguments to at least delay the actual call into the formatting machinery but I am not happy about duplicating the format string and it only improves thing to one order of magnitude slower instead of two:

enum_from_pyobject      time:   [669.27 ns 672.85 ns 676.48 ns]    
list_via_cast_as        time:   [3.4959 ns 3.4966 ns 3.4975 ns]                              
list_via_extract        time:   [3.0602 ns 3.0628 ns 3.0659 ns]                              
not_a_list_via_cast_as  time:   [3.2751 ns 3.4409 ns 3.5769 ns]                                    
not_a_list_via_extract  time:   [45.943 ns 46.069 ns 46.255 ns]                                    
not_a_list_via_extract_enum                                                                             
                        time:   [44.261 ns 44.301 ns 44.344 ns]

So IMHO, this still seems to be a performance foot gun, especially in derived code.

@adamreichold adamreichold marked this pull request as ready for review April 6, 2022 12:27
@mejrs
Copy link
Member

mejrs commented Apr 6, 2022

How about changing the FromPyObject trait itself?

pub trait FromPyObject<'source>: Sized {
    type Error;
    /// Extracts `Self` from the source `PyObject`.
    fn extract(ob: &'source PyAny) -> Result<Self, Self::Error>;
}

Where the Error type would be PyDowncastError for a Python type and PyErr otherwise.

(note: if we start changing conversion traits we should probably consider all of them)

@adamreichold
Copy link
Member Author

How about changing the FromPyObject trait itself?

I think we should consider this, but this could be in a maintenance release helping existing deployments without obstructing such a change to FromPyObject, couldn't it?

Regarding adding an associated Error type to FromPyObject, I think we would need to consider the problem of composition, i.e.

#[derive(FromPyObject)]
struct Foo {
  bar: Bar,
  qux: Qux,
}

must derive an Error type that can be construct from either <Bar as FromPyObject>::Error and <Qux as FromPyObject>::Error. Do we create ad-hoc error types for this? Or do we require a trait bound and box them up?

@davidhewitt
Copy link
Member

davidhewitt commented Apr 7, 2022

How about changing the FromPyObject trait itself?

There's a very old thread at #426 suggesting a rework. That thread is probably very out of date; it might be worth killing and starting a new discussion.

For what it's worth, the idea in #1308 would also change FromPyObject, by adding a separate lifetime (one becomes the lifetime of the data, the second the lifetime of the GIL).

Where the Error type would be PyDowncastError for a Python type and PyErr otherwise.

This kind of proposal gets a 👍 from me. Having error handling be as efficient as possible is a worthwhile goal.

@adamreichold
Copy link
Member Author

adamreichold commented Apr 7, 2022

Just wanted to say that my intention for this PR was limited to provide the benchmark and a performance bugfix for 0.16.x. I don't consider closing the mentioned issue of missing "performance docs" and I think that if we want to revamp the conversion traits, we should probably reboot #426 or #1308.

Having error handling be as efficient as possible is a worthwhile goal.

I think it is of specific importance for the conversion traits as making Python functions "polymorphic" by downcast cascades which by necessity produce a lot of failed conversions is a common pattern.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep makes sense. I think would be helpful to put out a 0.16.4 for the coverage fix as well as #2289, so let's merge this as-is and then consider further refactoring for 0.17 later.

Please add a Changed CHANGELOG entry documenting the performance tweak and then feel free to merge 👍

@adamreichold
Copy link
Member Author

Please add a Changed CHANGELOG entry documenting the performance tweak and then feel free to merge +1

Done. Will give it some time if anyone fancies a look and merge in the afternoon.

@adamreichold adamreichold merged commit 551db72 into main Apr 10, 2022
@adamreichold adamreichold deleted the extract-error-is-slow branch April 10, 2022 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants