Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Type Hints to Scrapy and its Modules #4041

Closed
royahsan opened this issue Sep 27, 2019 · 21 comments
Closed

Adding Type Hints to Scrapy and its Modules #4041

royahsan opened this issue Sep 27, 2019 · 21 comments

Comments

@royahsan
Copy link

royahsan commented Sep 27, 2019

Summary

We should add Variable Annotations/ Type hints as supported in PEP 526 , Python 3.6 to Scrapy to help out existing and new contributors and developers in understanding scrapy code.

Motivation

Intellisense enabled IDES like PyCharm need Type hints to provide better experience.
For new contributors to understand Scrapy comprehensively, type hints are vital.

Consider someone not that familiar with scrapy, stumbling upon scheduler's constructor.

    def __init__(self, dupefilter, jobdir=None, dqclass=None, mqclass=None,
                 logunser=False, stats=None, pqclass=None, crawler=None):
        self.df = dupefilter
        self.dqdir = self._dqdir(jobdir)
        self.pqclass = pqclass
        self.dqclass = dqclass
        self.mqclass = mqclass
        self.logunser = logunser
        self.stats = stats
        self.crawler = crawler
@Gallaecio
Copy link
Member

Related to #3618

@elacuesta
Copy link
Member

Agreed. This sounds great, we should just keep in mind that Scrapy won't drop support for Python 2.7 until 2.0

Also (not exactly what this issue is about, but related), variable annotations were introduced in Python 3.6 (PEP 526)

@elacuesta elacuesta added this to the v2.0 milestone Sep 27, 2019
@royahsan
Copy link
Author

royahsan commented Sep 29, 2019

@elacuesta yes you are right , variable annotation were introduced in PEP 526 , only function annotations were introduced in PEP 484 , i was in in little hurry finding the PEP version.

@laurentS
Copy link

laurentS commented Oct 8, 2019

Agreed. This sounds great, we should just keep in mind that Scrapy won't drop support for Python 2.7 until 2.0

Based on mypy docs it seems that it's possible to provide the type hints as separate stub files, which shouldn't interfere with older versions of python that don't support them. Depending on the timeline for scrapy 2.0, this might be a way to fix this issue earlier...

@kmike
Copy link
Member

kmike commented Oct 8, 2019

There is also comment-based syntax which works in 2.7. Anyways, I think that'd be great to test the waters on Scrapy dependencies first - w3lib, parsel. See e.g. scrapy/w3lib#123. Python 3 syntax is nicer though, so I'd prefer to go with Python3-only syntax first (3.5+, not 3.6+; for variable annotations one can use comment-based annotations).

@kmike
Copy link
Member

kmike commented Oct 8, 2019

@laurentS we want to make 1.8 release ASAP, after fixing #4036. Right after 1.8 we'll drop Python 2 support, and go into 2.0. It doesn't make sense to invest in Python 2-compatible code now.

@grammy-jiang
Copy link
Contributor

grammy-jiang commented Dec 8, 2019

Considering there are already hundreds and thousands of lines code in Scrapy, Monkeytype may be a good choice to automatically add typing hint for existing code:

MonkeyType collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type annotations directly to your Python code based on the types collected at runtime.

Before each data type is added manually, I think this can give some help for typing hint.

@grammy-jiang
Copy link
Contributor

grammy-jiang commented Dec 16, 2019

Considering there are already hundreds and thousands of lines code in Scrapy, Monkeytype may be a good choice to automatically add typing hint for existing code:

MonkeyType collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type annotations directly to your Python code based on the types collected at runtime.

Before each data type is added manually, I think this can give some help for typing hint.

I have tested MonkeyType, and get the following conclusions:

  • MonkeyType ONLY add typing hint to the input (arguments) and output (return) of the methods which are tested in the test cases - no variables inside methods can be typing hint
  • Because the input (arguments) may be mocked in the test cases, the typing hint may include this mocked objects' types, which is not accurate and the imports is also polluted, e.g.:
...
from unittest.mock import MagicMock
...
class RobotsTxtMiddleware(object):
    DOWNLOAD_PRIORITY = 1000

    def __init__(self, crawler: Union[Crawler, MagicMock]) -> None:
	...
  • The from_crawler classmethod will be added a typing hint for returning an instance of the class, Monkeytype can't add from __future__ import annotations automatically
  • Some imports will be added for typing hint, but this may cause some duplications - Monkeytype will only import from the very detail module path and ignore the imports already imported, e.g.:
...
from scrapy.http import Request  # original imports
...
from scrapy.http.request import Request  # added by MonkeyType
...

It seems Monkeytype can help for typing hint in some way, but manually checking the result is still necessary.

@laurentS
Copy link

@grammy-jiang it sounds similar to mypy's stubgen, except the later only does a static analysis of the code. In my experience, the stubs that it produces still require manual adjustments. I don't know how it compares to MonkeyType, in terms of quality of results. Have you checked?

@laurentS
Copy link

I just came across pyannotate which does essentially the same job as MonkeyType. It might be interesting to compare their outputs.

@kmike
Copy link
Member

kmike commented Dec 17, 2019

My 2c: I'd prefer high-quality type coverage of a small user-facing part of the Scrapy, over extensive automatically generated type hints, some of them incorrect.

@grammy-jiang
Copy link
Contributor

grammy-jiang commented Dec 17, 2019

@laurentS Yes, Monkeytype just create stub files first then you can choose to add typing hint to the code manually (monkeytype apply <module>). The output of Monkeytype is not high quality, not at all.

@kmike if you want to a high-quality typing hint, I am afraid manually adding typing hint is the only way at this moment. Considering a large number of codes we have already in this project, it may take a very long time to add typing hint to all modules. Maybe splitting the work into several stages and implementing one by one is a good idea.

grammy-jiang added a commit to grammy-jiang/scrapy that referenced this issue Dec 18, 2019
This commit adds typing hint to httpcache downloadermiddlewares as a
test for scrapy#4041
@kmike kmike removed this from the v2.0 milestone Jan 30, 2020
@laurentS
Copy link

laurentS commented Apr 7, 2020

This pytest plugin pytest-mypy-plugins might help write tests to check that new typing annotations are correct. Thisblog post explains how to use it. It was written to help annotate Django, so it should be relatively solid.

@synodriver
Copy link

Well I think pydantic would be useful if type hints could be added to scrapy in future versions. It could help validate types, also make it easier to read configuration files and environment variables. After adding these settings, the code will be easier to understand. For example, Item can be defined as subclasses of pydantic.BaseModel , making it more convenient to use, and settings can be defined as subclasses of pydantic.BaseSettings

@ChihweiLHBird
Copy link

@grammy-jiang Hey, is there any plan to add type hints to the rest of the mudules of scrapy? I am trying to understand some parts of the project and I think it would be very helpful if there are type hints. If possible, I would like to add some type hints and create a PR for them.

@Gallaecio
Copy link
Member

Yes, we plan to do it, specially to those parts of the API that users interact with. Feel free to create a pull request to add type hints to some parts of the code.

@grammy-jiang
Copy link
Contributor

Hi, @ChihweiLHBird ,

You can go through the previous posts in this discussion, and also this PR:

I haven't got time to work on this. Your work about adding typing hints will be very appreciated.

Something you may need to note:

  • the strength of typing hints support is different between python versions, you may need to check the python versions that current scrapy supports
  • powerful typing hints support from annotation of __future__

@half2me
Copy link

half2me commented Apr 13, 2021

Having used pydantic before, I'd be keen on seeing it in scrapy.

@slix
Copy link

slix commented Jan 7, 2022

Considering there are already hundreds and thousands of lines code in Scrapy, Monkeytype may be a good choice to automatically add typing hint for existing code:

It seems Monkeytype can help for typing hint in some way, but manually checking the result is still necessary.

The output of Monkeytype is not high quality, not at all.

Maybe splitting the work into several stages and implementing one by one is a good idea.

My 2c: I'd prefer high-quality type coverage of a small user-facing part of the Scrapy, over extensive automatically generated type hints, some of them incorrect.

I ran MonkeyType through the unit tests across the scrapy codebase and saved the changes of all .py files.

slix@2f22b84 <--

The result is a decent starting point for typing Scrapy. Contributors wouldn't have to run monkeytype and collect types from unit tests themselves.

MonkeyType's output is too messy to commit directly. So each .py file still needs a lot of human attention like cleaning up imports and deciding on reasonable types from what unit tests saw.

My output doesn't include Coroutine and AsyncGenerator due to a bug in MonkeyType (Instagram/MonkeyType#252). There's a lot of activity in this repo about async right now, so that bug may be a blocker.

An alternative to MonkeyType is pytype, which does static analysis. I now wonder whether that would produce better results.

Yes, we plan to do it, specially to those parts of the API that users interact with.

Should any typing efforts be focused on the APIs shown in the Scrapy documentation instead of on internal code? Should methods starting in underscores be left untyped?

@Gallaecio
Copy link
Member

Should any typing efforts be focused on the APIs shown in the Scrapy documentation instead of on internal code?

Yes.

Should methods starting in underscores be left untyped?

No, but they are indeed on the low end of type-hinting priorities.

@wRAR
Copy link
Member

wRAR commented Aug 31, 2023

This is work in progress with its own typing tag, I don't see a reason to have this open.

@wRAR wRAR closed this as completed Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests