From 94f060fcc84853f28f3f91b6dde1d61c8e19251e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adri=C3=A1n=20Chaves?= Date: Tue, 29 Oct 2019 12:53:46 +0100 Subject: [PATCH] Cover Scrapy 1.8.0 in the release notes (#3952) --- docs/news.rst | 226 +++++++++++++++++++++++++++++++++++++++- docs/topics/logging.rst | 5 +- scrapy/logformatter.py | 2 +- 3 files changed, 229 insertions(+), 4 deletions(-) diff --git a/docs/news.rst b/docs/news.rst index 8dfe8693c7a..669844045e3 100644 --- a/docs/news.rst +++ b/docs/news.rst @@ -6,6 +6,209 @@ Release notes .. note:: Scrapy 1.x will be the last series supporting Python 2. Scrapy 2.0, planned for Q4 2019 or Q1 2020, will support **Python 3 only**. +.. _release-1.8.0: + +Scrapy 1.8.0 (2019-10-28) +------------------------- + +Highlights: + +* Dropped Python 3.4 support and updated minimum requirements; made Python 3.8 + support official +* New :meth:`Request.from_curl ` class method +* New :setting:`ROBOTSTXT_PARSER` and :setting:`ROBOTSTXT_USER_AGENT` settings +* New :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` and + :setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` settings + +Backward-incompatible changes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* Python 3.4 is no longer supported, and some of the minimum requirements of + Scrapy have also changed: + + * cssselect_ 0.9.1 + * cryptography_ 2.0 + * lxml_ 3.5.0 + * pyOpenSSL_ 16.2.0 + * queuelib_ 1.4.2 + * service_identity_ 16.0.0 + * six_ 1.10.0 + * Twisted_ 17.9.0 (16.0.0 with Python 2) + * zope.interface_ 4.1.3 + + (:issue:`3892`) + +* ``JSONRequest`` is now called :class:`~scrapy.http.JsonRequest` for + consistency with similar classes (:issue:`3929`, :issue:`3982`) + +* If you are using a custom context factory + (:setting:`DOWNLOADER_CLIENTCONTEXTFACTORY`), its ``__init__`` method must + accept two new parameters: ``tls_verbose_logging`` and ``tls_ciphers`` + (:issue:`2111`, :issue:`3392`, :issue:`3442`, :issue:`3450`) + +* :class:`~scrapy.loader.ItemLoader` now turns the values of its input item + into lists:: + + >>> item = MyItem() + >>> item['field'] = 'value1' + >>> loader = ItemLoader(item=item) + >>> item['field'] + ['value1'] + + This is needed to allow adding values to existing fields + (``loader.add_value('field', 'value2')``). + + (:issue:`3804`, :issue:`3819`, :issue:`3897`, :issue:`3976`, :issue:`3998`, + :issue:`4036`) + +See also :ref:`1.8-deprecation-removals` below. + + +New features +~~~~~~~~~~~~ + +* A new :meth:`Request.from_curl ` class + method allows :ref:`creating a request from a cURL command + ` (:issue:`2985`, :issue:`3862`) + +* A new :setting:`ROBOTSTXT_PARSER` setting allows choosing which robots.txt_ + parser to use. It includes built-in support for + :ref:`RobotFileParser `, + :ref:`Protego ` (default), :ref:`Reppy `, and + :ref:`Robotexclusionrulesparser `, and allows you to + :ref:`implement support for additional parsers + ` (:issue:`754`, :issue:`2669`, + :issue:`3796`, :issue:`3935`, :issue:`3969`, :issue:`4006`) + +* A new :setting:`ROBOTSTXT_USER_AGENT` setting allows defining a separate + user agent string to use for robots.txt_ parsing (:issue:`3931`, + :issue:`3966`) + +* :class:`~scrapy.spiders.Rule` no longer requires a :class:`LinkExtractor + ` parameter + (:issue:`781`, :issue:`4016`) + +* Use the new :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` setting to customize + the TLS/SSL ciphers used by the default HTTP/1.1 downloader (:issue:`3392`, + :issue:`3442`) + +* Set the new :setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` setting to + ``True`` to enable debug-level messages about TLS connection parameters + after establishing HTTPS connections (:issue:`2111`, :issue:`3450`) + +* Callbacks that receive keyword arguments + (see :attr:`Request.cb_kwargs `) can now be + tested using the new :class:`@cb_kwargs + ` + :ref:`spider contract ` (:issue:`3985`, :issue:`3988`) + +* When a :class:`@scrapes ` spider + contract fails, all missing fields are now reported (:issue:`766`, + :issue:`3939`) + +* :ref:`Custom log formats ` can now drop messages by + having the corresponding methods of the configured :setting:`LOG_FORMATTER` + return ``None`` (:issue:`3984`, :issue:`3987`) + +* A much improved completion definition is now available for Zsh_ + (:issue:`4069`) + + +Bug fixes +~~~~~~~~~ + +* :meth:`ItemLoader.load_item() ` no + longer makes later calls to :meth:`ItemLoader.get_output_value() + ` or + :meth:`ItemLoader.load_item() ` return + empty data (:issue:`3804`, :issue:`3819`, :issue:`3897`, :issue:`3976`, + :issue:`3998`, :issue:`4036`) + +* Fixed :class:`~scrapy.statscollectors.DummyStatsCollector` raising a + :exc:`TypeError` exception (:issue:`4007`, :issue:`4052`) + +* :meth:`FilesPipeline.file_path + ` and + :meth:`ImagesPipeline.file_path + ` no longer choose + file extensions that are not `registered with IANA`_ (:issue:`1287`, + :issue:`3953`, :issue:`3954`) + +* When using botocore_ to persist files in S3, all botocore-supported headers + are properly mapped now (:issue:`3904`, :issue:`3905`) + +* FTP passwords in :setting:`FEED_URI` containing percent-escaped characters + are now properly decoded (:issue:`3941`) + +* A memory-handling and error-handling issue in + :func:`scrapy.utils.ssl.get_temp_key_info` has been fixed (:issue:`3920`) + + +Documentation +~~~~~~~~~~~~~ + +* The documentation now covers how to define and configure a :ref:`custom log + format ` (:issue:`3616`, :issue:`3660`) + +* API documentation added for :class:`~scrapy.exporters.MarshalItemExporter` + and :class:`~scrapy.exporters.PythonItemExporter` (:issue:`3973`) + +* API documentation added for :class:`~scrapy.item.BaseItem` and + :class:`~scrapy.item.ItemMeta` (:issue:`3999`) + +* Minor documentation fixes (:issue:`2998`, :issue:`3398`, :issue:`3597`, + :issue:`3894`, :issue:`3934`, :issue:`3978`, :issue:`3993`, :issue:`4022`, + :issue:`4028`, :issue:`4033`, :issue:`4046`, :issue:`4050`, :issue:`4055`, + :issue:`4056`, :issue:`4061`, :issue:`4072`, :issue:`4071`, :issue:`4079`, + :issue:`4081`, :issue:`4089`, :issue:`4093`) + + +.. _1.8-deprecation-removals: + +Deprecation removals +~~~~~~~~~~~~~~~~~~~~ + +* ``scrapy.xlib`` has been removed (:issue:`4015`) + + +Deprecations +~~~~~~~~~~~~ + +* The LevelDB_ storage backend + (``scrapy.extensions.httpcache.LeveldbCacheStorage``) of + :class:`~scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware` is + deprecated (:issue:`4085`, :issue:`4092`) + +* Use of the undocumented ``SCRAPY_PICKLED_SETTINGS_TO_OVERRIDE`` environment + variable is deprecated (:issue:`3910`) + +* ``scrapy.item.DictItem`` is deprecated, use :class:`~scrapy.item.Item` + instead (:issue:`3999`) + + +Other changes +~~~~~~~~~~~~~ + +* Minimum versions of optional Scrapy requirements that are covered by + continuous integration tests have been updated: + + * botocore_ 1.3.23 + * Pillow_ 3.4.2 + + Lower versions of these optional requirements may work, but it is not + guaranteed (:issue:`3892`) + +* GitHub templates for bug reports and feature requests (:issue:`3126`, + :issue:`3471`, :issue:`3749`, :issue:`3754`) + +* Continuous integration fixes (:issue:`3923`) + +* Code cleanup (:issue:`3391`, :issue:`3907`, :issue:`3946`, :issue:`3950`, + :issue:`4023`, :issue:`4031`) + + +.. _release-1.7.4: + Scrapy 1.7.4 (2019-10-21) ------------------------- @@ -18,22 +221,31 @@ makes later calls to :meth:`ItemLoader.get_output_value() ` or :meth:`ItemLoader.load_item() ` return empty data. + +.. _release-1.7.3: + Scrapy 1.7.3 (2019-08-01) ------------------------- Enforce lxml 4.3.5 or lower for Python 3.4 (:issue:`3912`, :issue:`3918`). + +.. _release-1.7.2: + Scrapy 1.7.2 (2019-07-23) ------------------------- Fix Python 2 support (:issue:`3889`, :issue:`3893`, :issue:`3896`). +.. _release-1.7.1: + Scrapy 1.7.1 (2019-07-18) ------------------------- Re-packaging of Scrapy 1.7.0, which was missing some changes in PyPI. + .. _release-1.7.0: Scrapy 1.7.0 (2019-07-18) @@ -568,7 +780,7 @@ Scrapy 1.5.2 (2019-01-22) See :ref:`telnet console ` documentation for more info -* Backport CI build failure under GCE environemnt due to boto import error. +* Backport CI build failure under GCE environment due to boto import error. .. _release-1.5.1: @@ -2830,23 +3042,35 @@ First release of Scrapy. .. _AJAX crawleable urls: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?csw=1 +.. _botocore: https://github.com/boto/botocore .. _chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding .. _ClientForm: http://wwwsearch.sourceforge.net/old/ClientForm/ .. _Creating a pull request: https://help.github.com/en/articles/creating-a-pull-request +.. _cryptography: https://cryptography.io/en/latest/ .. _cssselect: https://github.com/scrapy/cssselect/ .. _docstrings: https://docs.python.org/glossary.html#term-docstring .. _KeyboardInterrupt: https://docs.python.org/library/exceptions.html#KeyboardInterrupt +.. _LevelDB: https://github.com/google/leveldb .. _lxml: http://lxml.de/ .. _marshal: https://docs.python.org/2/library/marshal.html .. _parsel.csstranslator.GenericTranslator: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.GenericTranslator .. _parsel.csstranslator.HTMLTranslator: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.HTMLTranslator .. _parsel.csstranslator.XPathExpr: https://parsel.readthedocs.io/en/latest/parsel.html#parsel.csstranslator.XPathExpr .. _PEP 257: https://www.python.org/dev/peps/pep-0257/ +.. _Pillow: https://python-pillow.org/ +.. _pyOpenSSL: https://www.pyopenssl.org/en/stable/ .. _queuelib: https://github.com/scrapy/queuelib +.. _registered with IANA: https://www.iana.org/assignments/media-types/media-types.xhtml .. _resource: https://docs.python.org/2/library/resource.html +.. _robots.txt: http://www.robotstxt.org/ .. _scrapely: https://github.com/scrapy/scrapely +.. _service_identity: https://service-identity.readthedocs.io/en/stable/ +.. _six: https://six.readthedocs.io/ .. _tox: https://pypi.python.org/pypi/tox +.. _Twisted: https://twistedmatrix.com/trac/ .. _Twisted - hello, asynchronous programming: http://jessenoller.com/blog/2009/02/11/twisted-hello-asynchronous-programming/ .. _w3lib: https://github.com/scrapy/w3lib .. _w3lib.encoding: https://github.com/scrapy/w3lib/blob/master/w3lib/encoding.py .. _What is cacheable: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1 +.. _zope.interface: https://zopeinterface.readthedocs.io/en/latest/ +.. _Zsh: https://www.zsh.org/ diff --git a/docs/topics/logging.rst b/docs/topics/logging.rst index 87ea43c7dd0..2db0ffddd3e 100644 --- a/docs/topics/logging.rst +++ b/docs/topics/logging.rst @@ -198,8 +198,9 @@ to override some of the Scrapy settings regarding logging. Custom Log Formats ------------------ -A custom log format can be set for different actions by extending :class:`~scrapy.logformatter.LogFormatter` class -and making :setting:`LOG_FORMATTER` point to your new class. +A custom log format can be set for different actions by extending +:class:`~scrapy.logformatter.LogFormatter` class and making +:setting:`LOG_FORMATTER` point to your new class. .. autoclass:: scrapy.logformatter.LogFormatter :members: diff --git a/scrapy/logformatter.py b/scrapy/logformatter.py index f15940ed116..3c61ed7e016 100644 --- a/scrapy/logformatter.py +++ b/scrapy/logformatter.py @@ -29,7 +29,7 @@ class LogFormatter(object): * ``args`` should be a tuple or dict with the formatting placeholders for ``msg``. The final log message is computed as ``msg % args``. - Users can define their own ``LogFormatter`` class if they want to customise how + Users can define their own ``LogFormatter`` class if they want to customize how each action is logged or if they want to omit it entirely. In order to omit logging an action the method must return ``None``.