Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the offsite middleware missing some requests #6358

Merged
merged 6 commits into from
May 13, 2024

Conversation

Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented May 13, 2024

Scrapy was enforcing Spider.allowed_domains only for requests coming from spider callbacks, but not for:

This is now solved by replacing the offsite spider middleware with a downloader middleware.

Fixes #1042, closes #2241.

@Gallaecio Gallaecio changed the title Redirect allowed domains Fix the offsite middleware missing some requests May 13, 2024
@Gallaecio Gallaecio merged commit f149ea4 into scrapy:2.11 May 13, 2024
24 checks passed
@kmike
Copy link
Member

kmike commented May 14, 2024

Hey @Gallaecio! Sorry for a late comment. Not sure if that's very importnant, but I wonder if we should keep both middlewares, and have both enabled, because spider middleware allows all these requests to skip the scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants