Skip to content

Commit

Permalink
avoid deprecating dont_filter in OffsiteMiddleware
Browse files Browse the repository at this point in the history
  • Loading branch information
BurnzZ committed Dec 13, 2023
1 parent eb5afd8 commit acba118
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 20 deletions.
6 changes: 3 additions & 3 deletions docs/topics/request-response.rst
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,9 @@ Request objects
:type priority: int

:param dont_filter: indicates that this request should not be filtered by
the scheduler. This is used when you want to perform an identical
request multiple times, to ignore the duplicates filter. Use it with
care, or you will get into crawling loops. Default to ``False``.
the scheduler or some middlewares. This is used when you want to perform
an identical request multiple times, to ignore the duplicates filter.
Use it with care, or you will get into crawling loops. Default to ``False``.
:type dont_filter: bool

:param errback: a function that will be called if any exception was
Expand Down
10 changes: 3 additions & 7 deletions docs/topics/spider-middleware.rst
Original file line number Diff line number Diff line change
Expand Up @@ -345,15 +345,11 @@ OffsiteMiddleware
:attr:`~scrapy.Spider.allowed_domains` attribute, or the
attribute is empty, the offsite middleware will allow all requests.

If ``allow_offsite`` is set to ``True`` in :attr:`Request.meta`, then the
offsite middleware will allow the request even if its domain is not listed
If the request has the :attr:`~scrapy.Request.dont_filter` attribute set to
``True`` or :attr:`Request.meta` has ``allow_offsite`` set to ``True``, then
the OffsiteMiddleware will allow the request even if its domain is not listed
in allowed domains.

.. caution:: Setting :attr:`~scrapy.Request.dont_filter` to ``True`` also
causes the offsite middleware to allow the request. However,
this is deprecated. Use ``allow_offsite`` instead in
:attr:`Request.meta`.


RefererMiddleware
-----------------
Expand Down
15 changes: 5 additions & 10 deletions scrapy/spidermiddlewares/offsite.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@

from scrapy import Spider, signals
from scrapy.crawler import Crawler
from scrapy.exceptions import ScrapyDeprecationWarning
from scrapy.http import Request, Response
from scrapy.statscollectors import StatsCollector
from scrapy.utils.httpobj import urlparse_cached
Expand Down Expand Up @@ -51,15 +50,11 @@ async def process_spider_output_async(
def _filter(self, request: Any, spider: Spider) -> bool:
if not isinstance(request, Request):
return True
if request.dont_filter:
warnings.warn(
"The dont_filter filter flag is deprecated in OffsiteMiddleware. "
"Set 'allow_offsite' to True in Request.meta instead.",
ScrapyDeprecationWarning,
stacklevel=2,
)
return True
if request.meta.get("allow_offsite") or self.should_follow(request, spider):
if (
request.dont_filter
or request.meta.get("allow_offsite")
or self.should_follow(request, spider)
):
return True
domain = urlparse_cached(request).hostname
if domain and domain not in self.domains_seen:
Expand Down

0 comments on commit acba118

Please sign in to comment.