New (experimental) type stubs for scikit-learn #25307

gramster · 2023-01-05T17:16:43Z

gramster
Jan 5, 2023

HI all

I built a tool last year to generate type stubs for scientific Python packages that use numpydoc docstrings. I combine my processing of the docstrings with monkeytype traces if there are good samples. I originally did this for matplotlib, but have just completed first versions for scikit-learn, scikit-image, networkx, and vispy; working on scipy. YMMV with these stubs; I try to parse the docstrings to something sensible and use a mapping file for those I can't make unambiguous sense of, but even the former is based in my interpretation of the docstring text which may be wrong (and probably is plenty of times). Nonetheless, it is not too difficult for me to make adjustments and regenerate these. Andreas suggested I open an issue here so interested people can try them out and give feedback. You can open issues at the repo.

I did this with the medium term aim of having good type annotations that pyright and pylance can leverage (I am the eng manager for Python runtime at Microsoft, and the pylance team falls under me). Good type annotations are not only good for finding errors but can give a much better editing experience in Visual Studio/Visual Studio Code (and any other editor that has language services that can leverage them, which is most editors these days once suitably configured). Our long term aim is to have package authors use these (if they wish) to add type stub packages or inline types that are maintained by them, rather than us (or really me, as my team is not that big and everyone else is busy with other things :-)). We did this successfully with the pandas team, thanks to a good partnership with Irv Lustig. Would love this to happen with other packages.

The stubs can be found here, which is also where issues can be files: https://github.com/microsoft/python-type-stubs/tree/main/sklearn (stubs for the other packages I mentioned can be found there too). The tool I use is at https://github.com/gramster/docs2stubs.

(You'll see a mix of old style and newer style type syntax; my tool favors the newer style but the additional ones added via monkeytype use the old style (e.g. Union, Tuple,... vs | or tuple).

An even better way to help improve these in the short term is by PRs against the map files in https://github.com/gramster/docs2stubs/tree/main/analysis. These are '#'-separated CSV files for parameter and return types that have docstrings and the types I map those docstrings to. Identifying issues there could fix large classes of bugs in these type stubs.

thomasjpfan · 2023-01-05T23:08:14Z

thomasjpfan
Jan 5, 2023
Maintainer

Thank you for opening this discussion! For me, I like the model from https://github.com/pandas-dev/pandas-stubs, where the type stubs are maintained in another repo. In scikit-learn, we are in the processes of adding "parameter constraints" such as:

scikit-learn/sklearn/decomposition/_pca.py

Lines 363 to 371 in ba1d23d

    
           _parameter_constraints: dict = { 
        
               "n_components": [ 
        
                   Interval(Integral, 0, None, closed="left"), 
        
                   Interval(Real, 0, 1, closed="neither"), 
        
                   StrOptions({"mle"}), 
        
                   None, 
        
               ], 
        
               "copy": ["boolean"], 
        
               "whiten": ["boolean"],

which are machine readable constraints on parameters. Currently, we use this for input validation, but I can see them being used to generate type stubs. The parameter constraints even have ranges, so we can make use of Annotated from PEP593 to add additional metadata to the type.

TLDR: In the near future, it should be easier for us to maintain a scikit-learn-stubs package with parameter constraints.

6 replies

thomasjpfan Jan 7, 2023
Maintainer

The parameter constraints is an internal validation framework that is developed for scikit-learn. I do not think other packages would be using it.

From what I've seen, I do not think other scientific Python packages have something similar, but I could be missing one.

glemaitre Jan 12, 2023
Maintainer

I backported the annotation in imbalanced-learn to see how difficult for third-party it would be to use them. Actually, it was pretty fine so we could think about making it public in a couple of versions once we detected potential regression.

Regarding the stubs, I think that the parameter constraints would be much more reliable than the NumPy documentation. Indeed, when introducing the constraints, we find out that the documentation can be wrong.

Up to now, we disregarded introducing the type annotations. In the end, I still dream of a tool that would translate the parameter constraints to partial NumPy documentation docstring, and why not Python stubs?

ogrisel Jan 12, 2023
Maintainer

Or even if we do not go for fully automated translation we could have an automated tool to check the consistency between docstrings and various kinds of annotations.

thomas-haslwanter Feb 27, 2023

How can one use the stubs in https://github.com/microsoft/python-type-stubs/tree/main/sklearn?
With pandas, I could install them with pip install pandas-stubs, but here I don't know how to proceed.

chrisstuartparry Jul 10, 2023

Also interested in this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New (experimental) type stubs for scikit-learn #25307

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

New (experimental) type stubs for scikit-learn #25307

gramster Jan 5, 2023

Replies: 1 comment · 6 replies

thomasjpfan Jan 5, 2023 Maintainer

thomasjpfan Jan 7, 2023 Maintainer

glemaitre Jan 12, 2023 Maintainer

ogrisel Jan 12, 2023 Maintainer

thomas-haslwanter Feb 27, 2023

chrisstuartparry Jul 10, 2023

gramster
Jan 5, 2023

Replies: 1 comment 6 replies

thomasjpfan
Jan 5, 2023
Maintainer

thomasjpfan Jan 7, 2023
Maintainer

glemaitre Jan 12, 2023
Maintainer

ogrisel Jan 12, 2023
Maintainer