New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: added Raises
section to a few functions in _stats_py.py
#20496
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Schefflera-Arboricola for the PR.
We are not very consistent with this and don't really have a policy. Would you mind posting a message on our forum to get others opinion? https://discuss.scientific-python.org/c/contributor/scipy that could help 😃 I am personally +1 as it's then more complete.
Nice additions! I find the ones that specify the minimum number of elements required particularly useful because they are not obvious. The ones about argument domains are also nice to add. Some of the additions about types or broadcast-ability may not be essential, because the dynamic typing nature of Python ensures you can never exhaust the valid inputs or possible exceptions. Therefore you can only do so much to remind a conscious programmer and wish the unconscious good luck! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Schefflera-Arboricola. In general I think adding information under Raises
that is specific to a particular algorithm can be helpful. What should be avoided is adding information that is generally expected from Python libraries (e.g., TypeError
if one passes in the wrong type) or from standard design patterns (e.g., axis
should be integer).
There is a bit of judgement needed when it's not clear cut. I'd suggest to only document things when there is a clear need. And not adding lots of docs for every last bit of argument validation in every function, because that is more distraction and churn than helpful.
scipy/stats/_stats_py.py
Outdated
Raises | ||
------ | ||
TypeError | ||
If `a` is not numeric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not desirable. It's too generic; a
is very clearly documented as a numerical array already.
scipy/stats/_stats_py.py
Outdated
Raises | ||
------ | ||
TypeError | ||
If `center` is not callable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not desirable. If it's an argument document as callable
, and you pass in the wrong type then TypeError
is the standard expectation in Python code.
scipy/stats/_stats_py.py
Outdated
Raises | ||
------ | ||
ValueError | ||
If `axis` is not an integer, `x` and `y` do not have the same length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not desirable to add this. axis
keywords are everywhere.
Raises | ||
------ | ||
ValueError | ||
If shape of `a` along the given `axis` is less than 8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of where it is helpful information I think, because this is specific to skewtest
and the error type isn't completely obvious based on the "less than 8" info in Notes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we document this, can we remove the error in the future? I can see the statement instead becoming "... is less than ___, a Monte Carlo version of the test is performed." when s method
option is added and the default becomes 'auto'"
. Also, when there is nan_policy='omit'
, it seems undesirable to raise an error when just one slice has fewer than the required number of elements. Many other stats functions just document these requirements and return NaN for the affected element when there is a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a problem with removing the error in the future.
And agreed to 'omit'
: returning nan
seems reasonable. Going from error to non-error (nan
or whatever) should always be fine, the reverse is typically not fine.
I was going through the docs of the
pearsonr
function which led me to add theRaises
sections to some of the functions in the file. Let me know if this LGT you.Thank you :)