Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance docstring for LinearRegression.fit #28741

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

miguelcsilva
Copy link
Contributor

Reference Issues/PRs

Fixes #28732

What does this implement/fix? Explain your changes.

Changes the docstring to include all types that can be used for the sample_weight parameter, as well as explaining the user what happens in each type.

Copy link

github-actions bot commented Apr 1, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 54c32de. Link to the linter CI: here

@@ -560,8 +560,15 @@ def fit(self, X, y, sample_weight=None):
y : array-like of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X's dtype if necessary.

sample_weight : array-like of shape (n_samples,), default=None
Individual weights for each sample.
sample_weight : array-like of shape (n_samples,), int, float or None (default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more consistent with how other docstrings are formatted

Suggested change
sample_weight : array-like of shape (n_samples,), int, float or None (default)
sample_weight : array-like of shape (n_samples,), int or float, default=None

I could only find None (default) in the actual text explaining a parameter, not in the "headline"

Comment on lines 569 to 571
- `int` or `float`: all samples have a weight equal to the value \
provided. Since there is no difference in the relative weight between samples, \
results in the same fitted model as when `sample_weight=None`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for something concise like

Suggested change
- `int` or `float`: all samples have a weight equal to the value \
provided. Since there is no difference in the relative weight between samples, \
results in the same fitted model as when `sample_weight=None`.
- `int` or `float`: all samples have a weight equal to the value provided.

I wonder if we should explain to the user what exact effect (no effect) this has, we don't go into that for array-like either. So maybe we can skip it.

@betatim
Copy link
Member

betatim commented Apr 2, 2024

Thanks for the PR fixing this! I left two small comments about formatting and level of detail

@lorentzenchr
Copy link
Member

While this PR add technically correct statements, I'm not so sure to include it. Also, if we start here, do we go through all estimators to adapt the description?

@jeremiedbb
Copy link
Member

I would not add the detailed paragraph either. But I'd still fix the valid types. And I think we should in all places where it applies (floats are not always supported), in other PRs though.

@@ -560,8 +560,15 @@ def fit(self, X, y, sample_weight=None):
y : array-like of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X's dtype if necessary.

sample_weight : array-like of shape (n_samples,), default=None
Individual weights for each sample.
sample_weight : array-like of shape (n_samples,), int, float or None (default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to have float and int. float covers both. In the few places where we already mention float as valid type, it comes first. Let's keep the same formating for consistency.

Suggested change
sample_weight : array-like of shape (n_samples,), int, float or None (default)
sample_weight : float or array-like of shape (n_samples,), default=None

@betatim
Copy link
Member

betatim commented Apr 3, 2024

The discussion on whether or not to do this is happening in #28732

@lorentzenchr lorentzenchr added the Needs Decision Requires decision label Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs say parameter sample_weight of LinearRegression.fit must be array but number is also valid
4 participants