Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand type hints across projects #98

Open
jaraco opened this issue Nov 15, 2023 · 11 comments
Open

Expand type hints across projects #98

jaraco opened this issue Nov 15, 2023 · 11 comments

Comments

@jaraco
Copy link
Owner

jaraco commented Nov 15, 2023

I received this inquiry in discord:

i want to put some effort into an initiative of filling jaraco.* and related projects with full-fledged type hints.
would you mind helping me by providing a short list of the most important projects that i should focus on in the first place?
thanks in advance!
my intention is to make those projects play nicely with mypy, because i find them very useful, but i also value fully typed code.

@jaraco
Copy link
Owner Author

jaraco commented Nov 15, 2023

I'm happy to accept contributions! I tried to find a popularity ranking of all Python packages, but I didn't succeed. I did find snyk Advisor provides a quick and easy way to find packages by first letter (https://snyk.io/advisor/packages/python/j). You could go through the list of packages there and look at the popularity of each. It also sounds as if the pypi downloads tables are available to be queried (https://pypistats.org/api/#etiquette), so you could probably construct a query to do that. You might also search Github for projects with the skeleton badge - that's a pretty good indicator that it's following my best practices.

Tips

  • These projects do already run mypy checks, so are nominally typed (at least as far as mypy can infer types). Running tox will cause the type checks to be run via pytest-mypy. Running tox -- -k mypy will run mainly just the mypy tests.
  • Avoid changing the implementation (in theory, hints should reflect the current, desired behavior and not require restructuring of the implementation); change the implementation only if mypy is revealing a legitimate flaw (and even then, it's better to contribute the fix in a separate commit or PR).
  • I don't subscribe to email notifications on my github projects (because I'm too busy to get to them in a timely fashion), so you'll need to be patient. If you need my attention for something urgent (you're blocked or you've waited a while), it's okay to mention me to get my attention.
  • I'm not a fan of type hinting noise (e.g. -> None for two-line private methods with no return). Try to focus on adding type hints that add value - that communicate something additional and non-obvious. I won't reject a PR based on this condition, but it'll be more easily accepted if it's clearly adding value and not lint.

@bswck
Copy link
Contributor

bswck commented Nov 15, 2023

I am used to type checking my code with mypy in a strict mode. While some may see the strict mode as too aggressive, I will have no problem adjusting a bigger codebase to pass strict type checking. This is what I had in mind during writing the message.

I'm not a fan of type hinting noise (e.g. -> None for two-line private methods with no return). Try to focus on adding type hints that add value - that communicate something additional and non-obvious. I won't reject a PR based on this condition, but it'll be more easily accepted if it's clearly adding value and not lint.

mypy disallows untyped definitions in the strict mode. A private function without the return statement is still assumed to return Any, which is untrue, since None is the desired return type:

class Foo:
    def _bar(self):  # seen by mypy as untyped
        pass

reveal_type(Foo()._bar())  # Revealed type is "Any"

class Spam:
    def _eggs(self, biz: int):  # seen as typed, return assumed to be Any
        pass

reveal_type(Spam()._eggs(5))  # Revealed type is "Any"

Note

The only exceptions from the behavior above are __init__ and __init_subclass__ methods where -> None can be omitted, provided the function has at least one type hint in the signature.

Relevant resources:

While no presence of the return statement in a method may be an obvious sign that the return type of that method does not matter, which is a logical reason not to leave a type hint there, even a return None statement with an unequivocal implication of the None return type is still seen as Any by mypy:

class Example:
    def _com(self, port: int):
        return None

reveal_type(Example()._com(43))  # Revealed type is "Any"

In places where the value of such a method is being used, the type hinting noise is then simply necessary to inform mypy about the actual return type that affects other scopes of the codebase.

I firmly believe that the best option would be being as explicit as possible when annotating types, because, as it seems, there are too few obvious cases at the end of the day.

The Zen of Python, by Tim Peters:

Beautiful is better than ugly.
Explicit is better than implicit.

Note: The examples were tested on mypy 1.7.0.

@bswck
Copy link
Contributor

bswck commented Nov 15, 2023

I'm happy to accept contributions! I tried to find a popularity ranking of all Python packages, but I didn't succeed. I did find snyk Advisor provides a quick and easy way to find packages by first letter (snyk.io/advisor/packages/python/j). You could go through the list of packages there and look at the popularity of each. It also sounds as if the pypi downloads tables are available to be queried (pypistats.org/api/#etiquette), so you could probably construct a query to do that. You might also search Github for projects with the skeleton badge - that's a pretty good indicator that it's following my best practices.

Thank you for your time spent on the research. I will use these resources and compile a TO-DO list of the projects to work on in this issue. Every subsequent PR will reference this issue.

@bswck
Copy link
Contributor

bswck commented Nov 15, 2023

I firmly believe that the best option would be being as explicit as possible when annotating types, because, as it seems, there are too few obvious cases at the end of the day.

Of course if the new type hints would appear too intricate for the eye, which I can totally understand, there is always an option of creating stubs to isolate two worlds of the implementation and the type hints, like in jaraco/jaraco.functools#22. What are your thoughts on this @jaraco?

@bswck
Copy link
Contributor

bswck commented Dec 5, 2023

The roadmap is ready! 🎉

I wrote a script that extracted all jaraco projects from PyPI and sorted them by the total of downloads in the last month.
The last two projects—Distutils and backports—have 0 downloads only because checking pypi stats on them causes an HTTP 404 error. I don't think inspecting that issue is necessary.

Note

Skeleton badges come from the PyPI latest releases, not the projects' repositories.
As a result, these badges are valid indicators of the years all the relevant latest PyPI releases took place.

The script produced the following table:

downloads last month jaraco's role uses skeleton?
setuptools 322655647 Owner skeleton
importlib-metadata 180078405 Owner skeleton
zipp 162258510 Owner skeleton
importlib-resources 65910632 Owner skeleton
keyring 40286684 Owner skeleton
jaraco.classes 37033262 Owner skeleton
setuptools-scm 25467595 Owner
pytest-runner 16293153 Owner skeleton
configparser 10187739 Owner skeleton
jsonpickle 9339847 Owner
pep517 6590096 Owner
inflect 4843645 Maintainer skeleton
twine 3678588 Owner
backports.functools-lru-cache 2285253 Owner skeleton
singledispatch 2269372 Owner skeleton
comtypes 1732642 Owner
cssutils 1668764 Owner skeleton
jaraco.functools 1623127 Owner skeleton
path 1583434 Owner skeleton
cheroot 1482151 Owner
tempora 890592 Owner skeleton
portend 829890 Owner skeleton
jaraco.collections 795281 Owner skeleton
CherryPy 790657 Owner
distribute 741920 Owner
jaraco.text 739296 Owner skeleton
jaraco.context 733410 Owner skeleton
keyrings.alt 614293 Owner skeleton
backports.entry-points-selectable 564770 Owner skeleton
path.py 333772 Owner
Tempita 202581 Maintainer
Fuzzy 120698 Owner
datadiff 117694 Owner
wolframalpha 51502 Owner
oathtool 34623 Owner skeleton
jaraco.logging 27595 Owner skeleton
irc 22739 Owner skeleton
cherrypy-cors 18930 Owner skeleton
jaraco.stream 18120 Owner skeleton
pytest-checkdocs 14617 Owner skeleton
pytest-enabler 14499 Owner skeleton
pip-run 10838 Owner skeleton
jaraco.env 9083 Owner skeleton
jaraco.itertools 8260 Owner skeleton
hgtools 7996 Owner
pytest-ignore-flaky 7441 Maintainer
jaraco.path 7331 Owner skeleton
jaraco.envs 6930 Owner skeleton
backports.datetime-timestamp 6897 Owner skeleton
jaraco.versioning 6230 Owner skeleton
mongo-connector 5814 Owner
pytest-perf 5729 Owner skeleton
jaraco.ui 5272 Owner
jaraco.packaging 4884 Owner skeleton
jaraco.develop 4502 Owner skeleton
jaraco.structures 4416 Owner skeleton
pytest-services 4205 Maintainer
rst.linker 3714 Owner skeleton
jaraco.windows 3706 Owner skeleton
jaraco.vcs 3633 Owner skeleton
backports.unittest-mock 3263 Owner skeleton
jaraco.docker 2477 Owner skeleton
suds-bis 2094 Owner skeleton
jaraco.abode 1941 Owner skeleton
jaraco.email 1704 Owner
jaraco.net 1697 Owner skeleton
MagicBus 1478 Owner
svg.charts 1416 Owner skeleton
jaraco.mongodb 1336 Owner skeleton
calendra 1242 Owner skeleton
jaraco.services 768 Owner skeleton
pmxbot 740 Owner skeleton
jaraco.tidelift 718 Owner skeleton
event 673 Owner
jaraco.clipboard 560 Owner
openpack 526 Owner skeleton
elastic2-doc-manager 504 Owner
jaraco.test 499 Owner skeleton
jaraco.pmxbot 439 Owner skeleton
dynpool 312 Owner
pytest-black-multipy 307 Owner
cherrypy-dynpool 307 Owner
jaraco.modb 269 Owner skeleton
yg.lockfile 239 Owner
chucknorris 236 Owner skeleton
nspektr 225 Owner skeleton
xlsxcessive 203 Owner skeleton
cmdix 189 Owner skeleton
googlevoice 188 Owner skeleton
jaraco.financial 175 Owner skeleton
jaraco.timing 171 Owner
jaraco.postgres 164 Owner
jaraco.compat 163 Owner
paradocx 162 Owner skeleton
http-okapi 142 Owner
jaraco.geo 137 Owner skeleton
elastic-doc-manager 132 Owner
jaraco.util 131 Owner skeleton
compilers 129 Owner
jaraco.apt 122 Owner
setuptools-svn 103 Owner
jaraco.xonsh 98 Owner skeleton
vr.cli 96 Owner
vr.runners 96 Owner
lpaste 92 Owner skeleton
rwt 92 Owner
jaraco.fabric 83 Owner skeleton
NAT-PMP 83 Owner
vr.common 81 Owner
pytest-home 74 Owner skeleton
fogbugz-bis 68 Owner skeleton
backports.hook-compressed 66 Owner skeleton
jaraco.nxt 61 Owner skeleton
motivation 60 Owner
jaraco.video 57 Owner
eggmonster 56 Owner
jaraco.site 55 Owner skeleton
vr.server 53 Owner
vr.imager 51 Owner
backports.print_function 49 Owner
librarypaste 47 Owner skeleton
backports.method_request 45 Owner
excuses 44 Owner skeleton
jaraco.translate 43 Owner
backports.pdb 43 Owner skeleton
jaraco.media 41 Owner skeleton
jaraco.crypto 39 Owner skeleton
pmxbot.webhooks 39 Owner
popquotes 38 Owner
vr.builder 37 Owner
solr-doc-manager 33 Owner
jaraco.input 32 Owner
jaraco.home 31 Owner skeleton
jaraco.imaging 30 Owner
vr.agent 29 Owner
jaraco.keyring 28 Owner
vr.events 28 Owner
pmxbot.rss 28 Owner
jaraco.xkcd 27 Owner
jaraco.office 26 Owner
freedompop 26 Owner
treehouse 26 Owner
dropbox-index 26 Owner skeleton
jaraco.zstd 24 Owner skeleton
recapturedocs 24 Owner skeleton
setuptools-hacks.bypass-summary-newline 21 Owner skeleton
pmxbot.nsfw 20 Owner
vr.launch 18 Owner
pmxbot.saysomething 13 Owner
jaraco.parables 11 Owner
yg.thumpy 11 Owner
yg.eventful 7 Owner
Distutils 0 Owner
backports 0 Owner

In order to make tracking the progress significantly easier, I've generated the following checklist-form roadmap.

Roadmap

TODO for every project linked below:

  • ensure a py.typed marker is present,
  • modernize type hinting,
  • ensure that mypy --strict passes successfully.

@jaraco, could you please let me know if you want to apply some additional filtering/sorting to the roadmap above?

@bswck
Copy link
Contributor

bswck commented Dec 5, 2023

An interesting approach would be to measure—for every project—how many PyPI projects depend on it, using libraries.io. But I think it would more or less correlate with downloads/month anyway, as I am assuming (without having done any research) that most of these downloads come from pipelines that install every dependency for the first time, and dependencies referencing the same project being together aren't that common (so like, when for two projects A and B sharing the same project X as a dependency, X in fact gets installed "twice less", because only once, as it is reused in the same environment by A and B, even though I would technically call X more popular in this case). This can get even more interesting if we add distinction between pip-like and pipx-like installation methods, where a shared dependency would be installed twice for separate environments of A and B, assuming A and B are CLI applications... I could be pointing out all the things that come to my mind further, and it gets very complicated and multilayered as I dig in the rabbit hole.

Yeah, so all in all I've learned that "package popularity" isn't trivial, because both the convergence in the web of dependencies as well as the statistics of total downloads in time play a role in the correct evaluation of how popular a package really is.

Anyway though, this is just an attempt to prioritize the tasks. Since some most popular jaraco projects commonly depend on some less popular jaraco projects (downloads/month-wise!), I started off from jaraco.functools and jaraco.classes.

@jaraco
Copy link
Owner Author

jaraco commented Dec 5, 2023

Amazing analysis. Thanks!

You'll notice that libraries.io is a project by Tidelift. You may have noticed that I also work with them as they work to garner support for open source maintainers from the enterprise users. They've probably built other tooling and may even be interested in collaborating on tooling to support open source maintenance. You may want to consider signing up with them as a maintainer (there's no cost and could potentially pay) and engaging on the forums to see if there is interest in collaborating. If you need a referral or anything to get signed up, let me know.

@bswck
Copy link
Contributor

bswck commented Dec 6, 2023

You may want to consider signing up with them as a maintainer

Thank you! I've just applied to lift a few of my projects!


Due to a considerable number of projects that need similar work, I created a project that aims to automate the whole process as much as possible—autorefine.

MonkeyType will turn out very handy when it comes to type hints generation. I will take care of making them as sophisticated as needed. I think all these projects have enough coverage, so I will simply generate the types by running tests.

I will leverage LibCST and create custom rules if needed for modernizing the projects (some had their last releases a few years ago)—but this is out of scope at the moment.

Contributions & suggestions very, very welcome—I am learning.

Hopefully the tool will speed up more_itertools.consume(map(functools.partial(refine, scope="typing"), jaraco_projects)).

@bswck
Copy link
Contributor

bswck commented Feb 7, 2024

I've made a Coherent OSS project for the initiative: https://github.com/orgs/coherent-oss/projects/3/views/2

@jaraco
Copy link
Owner Author

jaraco commented Mar 30, 2024

I firmly believe that the best option would be being as explicit as possible when annotating types, because, as it seems, there are too few obvious cases at the end of the day.

Of course if the new type hints would appear too intricate for the eye, which I can totally understand, there is always an option of creating stubs to isolate two worlds of the implementation and the type hints, like in jaraco/jaraco.functools#22. What are your thoughts on this @jaraco?

I guess that's fine. I should probably get used to Python being more verbose and less essential.

@bswck
Copy link
Contributor

bswck commented Mar 30, 2024

I firmly believe that the best option would be being as explicit as possible when annotating types, because, as it seems, there are too few obvious cases at the end of the day.

Of course if the new type hints would appear too intricate for the eye, which I can totally understand, there is always an option of creating stubs to isolate two worlds of the implementation and the type hints, like in jaraco/jaraco.functools#22. What are your thoughts on this @jaraco?

I guess that's fine. I should probably get used to Python being more verbose and less essential.

Well, I guess it's just that Python wasn't built for being statically typed. 🤷‍♀️
The good news is that due to its continuous development, the unfortunate effect of verbosity over essentialness in typing slowly decreases: take PEP 695 as an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants