Releases · scrapy/scrapy

18 Jul 14:28

Gallaecio

1.7.0

4e23d70

1.7.0

Highlights:

Improvements for crawls targeting multiple domains
A cleaner way to pass arguments to callbacks
A new class for JSON requests
Improvements for rule-based spiders
New features for feed exports

See the full change log

Assets 2

11 Feb 13:48

dangra

1.6.0

b859435

1.6.0

Highlights:

Better Windows support
Python 3.7 compatibility
Big documentation improvements, including a switch from .extract_first() + .extract() API to .get() + .getall() API
Feed exports, FilePipeline and MediaPipeline improvements
Better extensibility: item_error and request_reached_downloader signals; from_crawler support for feed exporters, feed storages and dupefilters.
scrapy.contracts fixes and new features
Telnet console security improvements, first released as a backport in Scrapy 1.5.2 (2019-01-22)
Clean-up of the deprecated code
Various bug fixes, small new features and usability improvements across the codebase.

Full changelog is in the docs.

Assets 2

30 Dec 15:35

redapple

1.5.0

aa83e15

1.5.0

This release brings small new features and improvements across the codebase.
Some highlights:

Google Cloud Storage is supported in FilesPipeline and ImagesPipeline.
Crawling with proxy servers becomes more efficient, as connections to proxies can be reused now.
Warnings, exception and logging messages are improved to make debugging easier.
scrapy parse command now allows to set custom request meta via --meta argument.
Compatibility with Python 3.6, PyPy and PyPy3 is improved; PyPy and PyPy3 are now supported officially, by running tests on CI.
Better default handling of HTTP 308, 522 and 524 status codes.
Documentation is improved, as usual.

Full changelog is in the docs.

Assets 2

29 Dec 15:39

dangra

1.4.0

5f69ec9

1.4.0

Release notes at https://doc.scrapy.org/en/latest/news.html#scrapy-1-4-0-2017-05-18

Assets 2

29 Dec 15:40

dangra

1.3.3

b48a0bb

1.3.3

Release notes at https://doc.scrapy.org/en/latest/news.html#scrapy-1-3-3-2017-03-10

Assets 2

08 Dec 09:56

redapple

1.2.2

f3d5995

1.2.2

Bug fixes

Fix a cryptic traceback when a pipeline fails on open_spider() (#2011)
Fix embedded IPython shell variables (fixing #396 that re-appeared in 1.2.0, fixed in #2418)
A couple of patches when dealing with robots.txt:
- handle (non-standard) relative sitemap URLs (#2390)
- handle non-ASCII URLs and User-Agents in Python 2 (#2373)

Documentation

Document "download_latency" key in Request‘s meta dict (#2033)
Remove page on (deprecated & unsupported) Ubuntu packages from ToC (#2335)
A few fixed typos (#2346, #2369, #2369, #2380) and clarifications (#2354, #2325, #2414)

Other changes

Advertize conda-forge as Scrapy’s official conda channel (#2387)
More helpful error messages when trying to use .css() or .xpath() on non-Text Responses (#2264)
startproject command now generates a sample middlewares.py file (#2335)
Add more dependencies’ version info in scrapy version verbose output (#2404)
Remove all *.pyc files from source distribution (#2386)

Assets 2

08 Dec 09:53

redapple

1.2.1

6df48d5

1.2.1

Bug fixes

Include OpenSSL’s more permissive default ciphers when establishing TLS/SSL connections (#2314).
Fix “Location” HTTP header decoding on non-ASCII URL redirects (#2321).

Documentation

Fix JsonWriterPipeline example (#2302).
Various notes: #2330 on spider names, #2329 on middleware methods processing order, #2327 on getting multi-valued HTTP headers as lists.

Other changes

Removed www. from start_urls in built-in spider templates (#2299).

Assets 2

03 Oct 13:25

redapple

1.2.0

3235bfe

1.2.0

New Features

New FEED_EXPORT_ENCODING setting to customize the encoding
used when writing items to a file.
This can be used to turn off \uXXXX escapes in JSON output.
This is also useful for those wanting something else than UTF-8
for XML or CSV output (#2034).
startproject command now supports an optional destination directory
to override the default one based on the project name (#2005).
New SCHEDULER_DEBUG setting to log requests serialization
failures (#1610).
JSON encoder now supports serialization of set instances (#2058).
Interpret application/json-amazonui-streaming as TextResponse (#1503).
scrapy is imported by default when using shell tools (shell,
inspect_response) (#2248).

Bug fixes

DefaultRequestHeaders middleware now runs before UserAgent middleware
(#2088). Warning: this is technically backwards incompatible,
though we consider this a bug fix.
HTTP cache extension and plugins that use the .scrapy data directory now
work outside projects (#1581). Warning: this is technically
backwards incompatible, though we consider this a bug fix.
Selector does not allow passing both response and text anymore
(#2153).
Fixed logging of wrong callback name with scrapy parse (#2169).
Fix for an odd gzip decompression bug (#1606).
Fix for selected callbacks when using CrawlSpider with scrapy parse
(#2225).
Fix for invalid JSON and XML files when spider yields no items (#872).
Implement flush() for StreamLogger avoiding a warning in logs (#2125).

Refactoring

canonicalize_url has been moved to w3lib.url (#2168).

Tests & Requirements

Scrapy's new requirements baseline is Debian 8 "Jessie". It was previously Ubuntu 12.04 Precise.
What this means in practice is that we run continuous integration tests with these (main) packages versions at a minimum: Twisted 14.0, pyOpenSSL 0.14, lxml 3.4.

Scrapy may very well work with older versions of these packages (the code base still has switches for older Twisted versions for example) but it is not guaranteed (because it's not tested anymore).

Documentation

Grammar fixes: #2128, #1566.
Download stats badge removed from README (#2160).
New scrapy architecture diagram (#2165).
Updated Response parameters documentation (#2197).
Reworded misleading RANDOMIZE_DOWNLOAD_DELAY description (#2190).
Add StackOverflow as a support channel (#2257).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes

Documentation

Other changes

Bug fixes

Documentation

Other changes

New Features

Bug fixes

Refactoring

Tests & Requirements

Documentation

Releases: scrapy/scrapy

1.7.0

1.6.0

1.5.0

1.4.0

1.3.3

1.2.2

Bug fixes

Documentation

Other changes

1.2.1

Bug fixes

Documentation

Other changes

1.2.0

New Features

Bug fixes

Refactoring

Tests & Requirements

Documentation