Releases: scrapy/scrapy
Releases · scrapy/scrapy
1.7.0
Highlights:
- Improvements for crawls targeting multiple domains
- A cleaner way to pass arguments to callbacks
- A new class for JSON requests
- Improvements for rule-based spiders
- New features for feed exports
1.6.0
Highlights:
- Better Windows support
- Python 3.7 compatibility
- Big documentation improvements, including a switch from .extract_first() + .extract() API to .get() + .getall() API
- Feed exports, FilePipeline and MediaPipeline improvements
- Better extensibility: item_error and request_reached_downloader signals; from_crawler support for feed exporters, feed storages and dupefilters.
- scrapy.contracts fixes and new features
- Telnet console security improvements, first released as a backport in Scrapy 1.5.2 (2019-01-22)
- Clean-up of the deprecated code
- Various bug fixes, small new features and usability improvements across the codebase.
1.5.0
This release brings small new features and improvements across the codebase.
Some highlights:
- Google Cloud Storage is supported in FilesPipeline and ImagesPipeline.
- Crawling with proxy servers becomes more efficient, as connections to proxies can be reused now.
- Warnings, exception and logging messages are improved to make debugging easier.
- scrapy parse command now allows to set custom request meta via --meta argument.
- Compatibility with Python 3.6, PyPy and PyPy3 is improved; PyPy and PyPy3 are now supported officially, by running tests on CI.
- Better default handling of HTTP 308, 522 and 524 status codes.
- Documentation is improved, as usual.
1.4.0
Release notes at https://doc.scrapy.org/en/latest/news.html#scrapy-1-4-0-2017-05-18
1.3.3
Release notes at https://doc.scrapy.org/en/latest/news.html#scrapy-1-3-3-2017-03-10
1.2.2
Bug fixes
- Fix a cryptic traceback when a pipeline fails on
open_spider()
(#2011) - Fix embedded IPython shell variables (fixing #396 that re-appeared in 1.2.0, fixed in #2418)
- A couple of patches when dealing with robots.txt:
Documentation
- Document "download_latency" key in Request‘s meta dict (#2033)
- Remove page on (deprecated & unsupported) Ubuntu packages from ToC (#2335)
- A few fixed typos (#2346, #2369, #2369, #2380) and clarifications (#2354, #2325, #2414)
Other changes
- Advertize conda-forge as Scrapy’s official conda channel (#2387)
- More helpful error messages when trying to use .css() or .xpath() on non-Text Responses (#2264)
- startproject command now generates a sample middlewares.py file (#2335)
- Add more dependencies’ version info in scrapy version verbose output (#2404)
- Remove all *.pyc files from source distribution (#2386)
1.2.1
Bug fixes
- Include OpenSSL’s more permissive default ciphers when establishing TLS/SSL connections (#2314).
- Fix “Location” HTTP header decoding on non-ASCII URL redirects (#2321).
Documentation
- Fix JsonWriterPipeline example (#2302).
- Various notes: #2330 on spider names, #2329 on middleware methods processing order, #2327 on getting multi-valued HTTP headers as lists.
Other changes
1.2.0
New Features
- New
FEED_EXPORT_ENCODING
setting to customize the encoding
used when writing items to a file.
This can be used to turn off\uXXXX
escapes in JSON output.
This is also useful for those wanting something else than UTF-8
for XML or CSV output (#2034). startproject
command now supports an optional destination directory
to override the default one based on the project name (#2005).- New
SCHEDULER_DEBUG
setting to log requests serialization
failures (#1610). - JSON encoder now supports serialization of
set
instances (#2058). - Interpret
application/json-amazonui-streaming
asTextResponse
(#1503). scrapy
is imported by default when using shell tools (shell
,
inspect_response
) (#2248).
Bug fixes
- DefaultRequestHeaders middleware now runs before UserAgent middleware
(#2088). Warning: this is technically backwards incompatible,
though we consider this a bug fix. - HTTP cache extension and plugins that use the
.scrapy
data directory now
work outside projects (#1581). Warning: this is technically
backwards incompatible, though we consider this a bug fix. Selector
does not allow passing bothresponse
andtext
anymore
(#2153).- Fixed logging of wrong callback name with
scrapy parse
(#2169). - Fix for an odd gzip decompression bug (#1606).
- Fix for selected callbacks when using
CrawlSpider
withscrapy parse
(#2225). - Fix for invalid JSON and XML files when spider yields no items (#872).
- Implement
flush()
forStreamLogger
avoiding a warning in logs (#2125).
Refactoring
Tests & Requirements
Scrapy's new requirements baseline is Debian 8 "Jessie". It was previously Ubuntu 12.04 Precise.
What this means in practice is that we run continuous integration tests with these (main) packages versions at a minimum: Twisted 14.0, pyOpenSSL 0.14, lxml 3.4.
Scrapy may very well work with older versions of these packages (the code base still has switches for older Twisted versions for example) but it is not guaranteed (because it's not tested anymore).