New (experimental) type stubs for scikit-learn #25307
Replies: 1 comment 6 replies
-
Thank you for opening this discussion! For me, I like the model from https://github.com/pandas-dev/pandas-stubs, where the type stubs are maintained in another repo. In scikit-learn, we are in the processes of adding "parameter constraints" such as: scikit-learn/sklearn/decomposition/_pca.py Lines 363 to 371 in ba1d23d which are machine readable constraints on parameters. Currently, we use this for input validation, but I can see them being used to generate type stubs. The parameter constraints even have ranges, so we can make use of TLDR: In the near future, it should be easier for us to maintain a |
Beta Was this translation helpful? Give feedback.
-
HI all
I built a tool last year to generate type stubs for scientific Python packages that use numpydoc docstrings. I combine my processing of the docstrings with monkeytype traces if there are good samples. I originally did this for matplotlib, but have just completed first versions for scikit-learn, scikit-image, networkx, and vispy; working on scipy. YMMV with these stubs; I try to parse the docstrings to something sensible and use a mapping file for those I can't make unambiguous sense of, but even the former is based in my interpretation of the docstring text which may be wrong (and probably is plenty of times). Nonetheless, it is not too difficult for me to make adjustments and regenerate these. Andreas suggested I open an issue here so interested people can try them out and give feedback. You can open issues at the repo.
I did this with the medium term aim of having good type annotations that pyright and pylance can leverage (I am the eng manager for Python runtime at Microsoft, and the pylance team falls under me). Good type annotations are not only good for finding errors but can give a much better editing experience in Visual Studio/Visual Studio Code (and any other editor that has language services that can leverage them, which is most editors these days once suitably configured). Our long term aim is to have package authors use these (if they wish) to add type stub packages or inline types that are maintained by them, rather than us (or really me, as my team is not that big and everyone else is busy with other things :-)). We did this successfully with the pandas team, thanks to a good partnership with Irv Lustig. Would love this to happen with other packages.
The stubs can be found here, which is also where issues can be files: https://github.com/microsoft/python-type-stubs/tree/main/sklearn (stubs for the other packages I mentioned can be found there too). The tool I use is at https://github.com/gramster/docs2stubs.
(You'll see a mix of old style and newer style type syntax; my tool favors the newer style but the additional ones added via monkeytype use the old style (e.g. Union, Tuple,... vs | or tuple).
An even better way to help improve these in the short term is by PRs against the map files in https://github.com/gramster/docs2stubs/tree/main/analysis. These are '#'-separated CSV files for parameter and return types that have docstrings and the types I map those docstrings to. Identifying issues there could fix large classes of bugs in these type stubs.
Beta Was this translation helpful? Give feedback.
All reactions