You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a head-to-head comparison between pydantic and the pandas extension in hypothesis, where we are generating a single-row dataframe with hypothesis (since pydantic only generates one model instance at a time).
For background, this issue was originally brought up in this issue in the pandera project. pandera (basically) has a thin wrapper around the pandas.data_frames() method from hypothesis.
My questions:
Do maintainers have a theory on what the cause of this speed discrepancy might be?
Is there appetite for optimizing this logic to make it faster? I'm happy to help where I can btw.
Thank you!
The text was updated successfully, but these errors were encountered:
There's just way way more logic involved in generating a dataframe, due to the possibility of having many rows, interacting constraints, etc. By contrast the pydantic model case is three built-in types, one function call, and no dtype conversions or validation or anything.
Yes, I'd be absolutely delighted to accept PRs for performance improvements - only caveat is if it makes future maintenance substantially harder.
This is a head-to-head comparison between
pydantic
and thepandas
extension inhypothesis
, where we are generating a single-row dataframe withhypothesis
(sincepydantic
only generates one model instance at a time).I ran it with:
pytest test_speed.py --durations=0 -v
Takeaway:
hypothesis
is 9-10x slower thanpydantic
as seen by this output:For background, this issue was originally brought up in this issue in the
pandera
project.pandera
(basically) has a thin wrapper around thepandas.data_frames()
method fromhypothesis
.My questions:
Thank you!
The text was updated successfully, but these errors were encountered: