Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cover Scrapy 1.8.0 in the release notes #3952

Merged
merged 3 commits into from Oct 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
226 changes: 225 additions & 1 deletion docs/news.rst
Expand Up @@ -6,6 +6,209 @@ Release notes
.. note:: Scrapy 1.x will be the last series supporting Python 2. Scrapy 2.0,
planned for Q4 2019 or Q1 2020, will support **Python 3 only**.

.. _release-1.8.0:

Scrapy 1.8.0 (2019-10-28)
-------------------------

Highlights:

* Dropped Python 3.4 support and updated minimum requirements; made Python 3.8
support official
* New :meth:`Request.from_curl <scrapy.http.Request.from_curl>` class method
* New :setting:`ROBOTSTXT_PARSER` and :setting:`ROBOTSTXT_USER_AGENT` settings
* New :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` and
:setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` settings

Backward-incompatible changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Python 3.4 is no longer supported, and some of the minimum requirements of
Scrapy have also changed:

* cssselect_ 0.9.1
* cryptography_ 2.0
* lxml_ 3.5.0
* pyOpenSSL_ 16.2.0
* queuelib_ 1.4.2
* service_identity_ 16.0.0
* six_ 1.10.0
* Twisted_ 17.9.0 (16.0.0 with Python 2)
* zope.interface_ 4.1.3

(:issue:`3892`)

* ``JSONRequest`` is now called :class:`~scrapy.http.JsonRequest` for
consistency with similar classes (:issue:`3929`, :issue:`3982`)

* If you are using a custom context factory
(:setting:`DOWNLOADER_CLIENTCONTEXTFACTORY`), its ``__init__`` method must
accept two new parameters: ``tls_verbose_logging`` and ``tls_ciphers``
(:issue:`2111`, :issue:`3392`, :issue:`3442`, :issue:`3450`)

* :class:`~scrapy.loader.ItemLoader` now turns the values of its input item
into lists::

>>> item = MyItem()
>>> item['field'] = 'value1'
>>> loader = ItemLoader(item=item)
>>> item['field']
['value1']

This is needed to allow adding values to existing fields
(``loader.add_value('field', 'value2')``).

(:issue:`3804`, :issue:`3819`, :issue:`3897`, :issue:`3976`, :issue:`3998`,
:issue:`4036`)

See also :ref:`1.8-deprecation-removals` below.


New features
~~~~~~~~~~~~

* A new :meth:`Request.from_curl <scrapy.http.Request.from_curl>` class
method allows :ref:`creating a request from a cURL command
<requests-from-curl>` (:issue:`2985`, :issue:`3862`)

* A new :setting:`ROBOTSTXT_PARSER` setting allows choosing which robots.txt_
parser to use. It includes built-in support for
:ref:`RobotFileParser <python-robotfileparser>`,
:ref:`Protego <protego-parser>` (default), :ref:`Reppy <reppy-parser>`, and
:ref:`Robotexclusionrulesparser <rerp-parser>`, and allows you to
:ref:`implement support for additional parsers
<support-for-new-robots-parser>` (:issue:`754`, :issue:`2669`,
:issue:`3796`, :issue:`3935`, :issue:`3969`, :issue:`4006`)

* A new :setting:`ROBOTSTXT_USER_AGENT` setting allows defining a separate
user agent string to use for robots.txt_ parsing (:issue:`3931`,
:issue:`3966`)

* :class:`~scrapy.spiders.Rule` no longer requires a :class:`LinkExtractor
<scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>` parameter
(:issue:`781`, :issue:`4016`)

* Use the new :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` setting to customize
the TLS/SSL ciphers used by the default HTTP/1.1 downloader (:issue:`3392`,
:issue:`3442`)

* Set the new :setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` setting to
``True`` to enable debug-level messages about TLS connection parameters
after establishing HTTPS connections (:issue:`2111`, :issue:`3450`)

* Callbacks that receive keyword arguments
(see :attr:`Request.cb_kwargs <scrapy.http.Request.cb_kwargs>`) can now be
tested using the new :class:`@cb_kwargs
<scrapy.contracts.default.CallbackKeywordArgumentsContract>`
:ref:`spider contract <topics-contracts>` (:issue:`3985`, :issue:`3988`)

* When a :class:`@scrapes <scrapy.contracts.default.ScrapesContract>` spider
contract fails, all missing fields are now reported (:issue:`766`,
:issue:`3939`)

* :ref:`Custom log formats <custom-log-formats>` can now drop messages by
having the corresponding methods of the configured :setting:`LOG_FORMATTER`
return ``None`` (:issue:`3984`, :issue:`3987`)

* A much improved completion definition is now available for Zsh_
(:issue:`4069`)


Bug fixes
~~~~~~~~~

* :meth:`ItemLoader.load_item() <scrapy.loader.ItemLoader.load_item>` no
longer makes later calls to :meth:`ItemLoader.get_output_value()
<scrapy.loader.ItemLoader.get_output_value>` or
:meth:`ItemLoader.load_item() <scrapy.loader.ItemLoader.load_item>` return
empty data (:issue:`3804`, :issue:`3819`, :issue:`3897`, :issue:`3976`,
:issue:`3998`, :issue:`4036`)

* Fixed :class:`~scrapy.statscollectors.DummyStatsCollector` raising a
:exc:`TypeError` exception (:issue:`4007`, :issue:`4052`)

* :meth:`FilesPipeline.file_path
<scrapy.pipelines.files.FilesPipeline.file_path>` and
:meth:`ImagesPipeline.file_path
<scrapy.pipelines.images.ImagesPipeline.file_path>` no longer choose
file extensions that are not `registered with IANA`_ (:issue:`1287`,
:issue:`3953`, :issue:`3954`)

* When using botocore_ to persist files in S3, all botocore-supported headers
are properly mapped now (:issue:`3904`, :issue:`3905`)

* FTP passwords in :setting:`FEED_URI` containing percent-escaped characters
are now properly decoded (:issue:`3941`)

* A memory-handling and error-handling issue in
:func:`scrapy.utils.ssl.get_temp_key_info` has been fixed (:issue:`3920`)


Documentation
~~~~~~~~~~~~~

* The documentation now covers how to define and configure a :ref:`custom log
format <custom-log-formats>` (:issue:`3616`, :issue:`3660`)

* API documentation added for :class:`~scrapy.exporters.MarshalItemExporter`
and :class:`~scrapy.exporters.PythonItemExporter` (:issue:`3973`)

* API documentation added for :class:`~scrapy.item.BaseItem` and
:class:`~scrapy.item.ItemMeta` (:issue:`3999`)

* Minor documentation fixes (:issue:`2998`, :issue:`3398`, :issue:`3597`,
:issue:`3894`, :issue:`3934`, :issue:`3978`, :issue:`3993`, :issue:`4022`,
:issue:`4028`, :issue:`4033`, :issue:`4046`, :issue:`4050`, :issue:`4055`,
:issue:`4056`, :issue:`4061`, :issue:`4072`, :issue:`4071`, :issue:`4079`,
:issue:`4081`, :issue:`4089`, :issue:`4093`)


.. _1.8-deprecation-removals:

Deprecation removals
~~~~~~~~~~~~~~~~~~~~

* ``scrapy.xlib`` has been removed (:issue:`4015`)


Deprecations
~~~~~~~~~~~~

* The LevelDB_ storage backend
(``scrapy.extensions.httpcache.LeveldbCacheStorage``) of
:class:`~scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware` is
deprecated (:issue:`4085`, :issue:`4092`)

* Use of the undocumented ``SCRAPY_PICKLED_SETTINGS_TO_OVERRIDE`` environment
variable is deprecated (:issue:`3910`)

* ``scrapy.item.DictItem`` is deprecated, use :class:`~scrapy.item.Item`
instead (:issue:`3999`)


Other changes
~~~~~~~~~~~~~

* Minimum versions of optional Scrapy requirements that are covered by
continuous integration tests have been updated:

* botocore_ 1.3.23
* Pillow_ 3.4.2

Lower versions of these optional requirements may work, but it is not
guaranteed (:issue:`3892`)

* GitHub templates for bug reports and feature requests (:issue:`3126`,
:issue:`3471`, :issue:`3749`, :issue:`3754`)

* Continuous integration fixes (:issue:`3923`)

* Code cleanup (:issue:`3391`, :issue:`3907`, :issue:`3946`, :issue:`3950`,
:issue:`4023`, :issue:`4031`)


.. _release-1.7.4:

Scrapy 1.7.4 (2019-10-21)
-------------------------

Expand All @@ -18,22 +221,31 @@ makes later calls to :meth:`ItemLoader.get_output_value()
<scrapy.loader.ItemLoader.get_output_value>` or :meth:`ItemLoader.load_item()
<scrapy.loader.ItemLoader.load_item>` return empty data.


.. _release-1.7.3:

Scrapy 1.7.3 (2019-08-01)
-------------------------

Enforce lxml 4.3.5 or lower for Python 3.4 (:issue:`3912`, :issue:`3918`).


.. _release-1.7.2:

Scrapy 1.7.2 (2019-07-23)
-------------------------

Fix Python 2 support (:issue:`3889`, :issue:`3893`, :issue:`3896`).


.. _release-1.7.1:

Scrapy 1.7.1 (2019-07-18)
-------------------------

Re-packaging of Scrapy 1.7.0, which was missing some changes in PyPI.


.. _release-1.7.0:

Scrapy 1.7.0 (2019-07-18)
Expand Down Expand Up @@ -568,7 +780,7 @@ Scrapy 1.5.2 (2019-01-22)

See :ref:`telnet console <topics-telnetconsole>` documentation for more info

* Backport CI build failure under GCE environemnt due to boto import error.
* Backport CI build failure under GCE environment due to boto import error.

.. _release-1.5.1:

Expand Down Expand Up @@ -2830,23 +3042,35 @@ First release of Scrapy.


.. _AJAX crawleable urls: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?csw=1
.. _botocore: https://github.com/boto/botocore
.. _chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
.. _ClientForm: http://wwwsearch.sourceforge.net/old/ClientForm/
.. _Creating a pull request: https://help.github.com/en/articles/creating-a-pull-request
.. _cryptography: https://cryptography.io/en/latest/
.. _cssselect: https://github.com/scrapy/cssselect/
.. _docstrings: https://docs.python.org/glossary.html#term-docstring
.. _KeyboardInterrupt: https://docs.python.org/library/exceptions.html#KeyboardInterrupt
.. _LevelDB: https://github.com/google/leveldb
.. _lxml: http://lxml.de/
.. _marshal: https://docs.python.org/2/library/marshal.html
.. _parsel.csstranslator.GenericTranslator: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.GenericTranslator
.. _parsel.csstranslator.HTMLTranslator: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.HTMLTranslator
.. _parsel.csstranslator.XPathExpr: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.XPathExpr
.. _PEP 257: https://www.python.org/dev/peps/pep-0257/
.. _Pillow: https://python-pillow.org/
.. _pyOpenSSL: https://www.pyopenssl.org/en/stable/
.. _queuelib: https://github.com/scrapy/queuelib
.. _registered with IANA: https://www.iana.org/assignments/media-types/media-types.xhtml
.. _resource: https://docs.python.org/2/library/resource.html
.. _robots.txt: http://www.robotstxt.org/
.. _scrapely: https://github.com/scrapy/scrapely
.. _service_identity: https://service-identity.readthedocs.io/en/stable/
.. _six: https://six.readthedocs.io/
.. _tox: https://pypi.python.org/pypi/tox
.. _Twisted: https://twistedmatrix.com/trac/
.. _Twisted - hello, asynchronous programming: http://jessenoller.com/blog/2009/02/11/twisted-hello-asynchronous-programming/
.. _w3lib: https://github.com/scrapy/w3lib
.. _w3lib.encoding: https://github.com/scrapy/w3lib/blob/master/w3lib/encoding.py
.. _What is cacheable: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
.. _zope.interface: https://zopeinterface.readthedocs.io/en/latest/
.. _Zsh: https://www.zsh.org/
5 changes: 3 additions & 2 deletions docs/topics/logging.rst
Expand Up @@ -198,8 +198,9 @@ to override some of the Scrapy settings regarding logging.
Custom Log Formats
------------------

A custom log format can be set for different actions by extending :class:`~scrapy.logformatter.LogFormatter` class
and making :setting:`LOG_FORMATTER` point to your new class.
A custom log format can be set for different actions by extending
:class:`~scrapy.logformatter.LogFormatter` class and making
:setting:`LOG_FORMATTER` point to your new class.

.. autoclass:: scrapy.logformatter.LogFormatter
:members:
Expand Down
2 changes: 1 addition & 1 deletion scrapy/logformatter.py
Expand Up @@ -29,7 +29,7 @@ class LogFormatter(object):
* ``args`` should be a tuple or dict with the formatting placeholders for ``msg``.
The final log message is computed as ``msg % args``.

Users can define their own ``LogFormatter`` class if they want to customise how
Users can define their own ``LogFormatter`` class if they want to customize how
each action is logged or if they want to omit it entirely. In order to omit
logging an action the method must return ``None``.

Expand Down