Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter Validation Documentation? #28903

Closed
AcylSilane opened this issue Apr 26, 2024 · 2 comments
Closed

Parameter Validation Documentation? #28903

AcylSilane opened this issue Apr 26, 2024 · 2 comments

Comments

@AcylSilane
Copy link
Contributor

AcylSilane commented Apr 26, 2024

While implementing a custom estimator, I noticed that the BaseEstimator class brings in a _validate_params method. Looking through this repo's history, it looks like it came in back during 2022 as part of PR #22722

    def _validate_params(self):
        """Validate types and values of constructor parameters

        The expected type and values must be defined in the `_parameter_constraints`
        class attribute, which is a dictionary `param_name: list of constraints`. See
        the docstring of `validate_parameter_constraints` for a description of the
        accepted constraints.
        """
        validate_parameter_constraints(
            self._parameter_constraints,
            self.get_params(deep=False),
            caller_name=self.__class__.__name__,
        )

Beyond the PR itself and a docstring in utils._param_validation.py there does not seem to be much information about this method. The string "_validate_params" returns no results on the web documentation. The Developing Scikit-Learn Estimators documentation also does not mention this tooling. so the only way to learn how to use it is to poke through the source code.

Looking around further, it seems like in the time since that PR, most of the estimators in the package now use the _fit_context decorator defined in base.py which indirectly calls the _validate_params method. That decorator also never appears in the web documentation.

For those of us who develop custom estimators that extend Sklearn's base classes, it is useful to re-use the tooling that already exists (especially when that tooling comes from Sklearn itself). I am curious about a few things I had trouble finding answers to:

  • Is there documentation on the canonical way to handle parameter validation?
  • This version of parameter validation relies heavily on definitions in utils._param_validation; because this is prefixed with an underscore, is it "safe" to import this as a dependency in downstream packages?
  • Is there any intended / planned relationship between utils._param_validation and utils.validation?
@AcylSilane AcylSilane added Documentation Needs Triage Issue requires triage labels Apr 26, 2024
@jeremiedbb
Copy link
Member

Hi @AcylSilane,
The reason for the lack of public documentation is that all of this is considered private for now (see the leading underscore in _validate_params, _fit_context). I agree that it could be useful for developers of custom estimators and we plan to better document it at some point. It's in fact part of a bigger project, which is having a well defined developer API, but we haven't progressed a lot on this (although it might get a boost in a near future).

Is there documentation on the canonical way to handle parameter validation?

No public documentation yet

This version of parameter validation relies heavily on definitions in utils._param_validation; because this is prefixed with an underscore, is it "safe" to import this as a dependency in downstream packages?

A lot more stable these days but I can't say "safe". We still consider it private and won't guarantee bacward compatibility between versions.

Is there any intended / planned relationship between utils._param_validation and utils.validation?

Not really, they serve different purpose. utils._param_validation is about validation of the hyperparameters of estimators and parameters of functions while utils.validation is more about validation of input data (X, y, sample_weight, ...)

@jeremiedbb jeremiedbb removed the Needs Triage Issue requires triage label Apr 26, 2024
@AcylSilane
Copy link
Contributor Author

Hi @jeremiedbb

Thank you for the quick response, and for answering my questions! Looking forward to developer API you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants