-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for custom exporter class #6273
base: master
Are you sure you want to change the base?
Changes from all commits
d1a0bed
bb8ff74
202f4ab
eb7d6f6
3438432
13a5c3e
32750db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -443,3 +443,66 @@ MarshalItemExporter | |
------------------- | ||
|
||
.. autoclass:: MarshalItemExporter | ||
|
||
Custom Item Exporters | ||
===================== | ||
|
||
You can also inherit from :class:`BaseItemExporter` and implement your own exporter. | ||
|
||
Usage: | ||
|
||
.. code-block:: python | ||
|
||
custom_settings = { | ||
"FEEDS": {"stdout://": {"format": "CustomAPI"}}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using Futhermore, there could be cases where you may not want to use |
||
"FEED_EXPORTERS": {"CustomAPI": "project.exporters.CustomExporter"}, | ||
} | ||
|
||
Here you can override the :meth:`~BaseItemExporter.export_item`, :meth:`~BaseItemExporter.start_exporting`, | ||
:meth:`~BaseItemExporter.finish_exporting`, :meth:`~BaseItemExporter.serialize_field` and :meth:`~BaseItemExporter.__init__` methods to customize the | ||
behavior of your exporter. | ||
|
||
.. tip:: In order to send non-blocking requests to external services, it is recommended | ||
to use ``twisted.internet.threads.deferToThread`` | ||
|
||
.. warning:: The storage **file object** is passed to the custom exporter ``__init__`` method and | ||
will behave like usual, according to the scheme in your FEEDS setting. It will also be closed | ||
when a batch is completed or the spider is closed. | ||
|
||
Example: | ||
|
||
.. tip:: :meth:`~BaseItemExporter.finish_exporting` can be an async method. | ||
|
||
.. code-block:: python | ||
|
||
from scrapy.exporters import BaseItemExporter | ||
|
||
|
||
class CustomExporter(BaseItemExporter): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rename it to |
||
def __init__(self, file, *args, dont_fail=False, **kwargs): | ||
self._kwargs = kwargs | ||
self.file = file | ||
self._configure(kwargs, dont_fail=dont_fail) | ||
self._pending_deferreds: List[defer.Deferred] = [] | ||
|
||
def start_exporting(self): | ||
pass | ||
|
||
def send_request(self, item): | ||
response = requests.post("https://httpbin.org/anything", json={"item": item}) | ||
|
||
def export_item(self, item): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this work too? @deferToThread
def export_item(self, item):
response = requests.post("https://httpbin.org/anything", json={"item": item}) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively, use def export_item(self, item):
url = 'https://httpbin.org/post'
data = {'item': item}
headers = {'Content-Type': 'application/json'}
return treq.post(url, json=data, headers=headers) |
||
dfd = threads.deferToThread(self.send_request, item) | ||
dfd.addCallbacks( | ||
callback=lambda result: logger.info(f"Successful response: {result}"), | ||
errback=lambda failure: logger.info( | ||
f"Error exporting item {item}: {failure.getTraceback()}" | ||
), | ||
) | ||
# optionally collect them to ensure that they are awaited | ||
dfd.addBoth(lambda _: self._pending_deferreds.remove(dfd)) | ||
self._pending_deferreds.append(dfd) | ||
return dfd | ||
|
||
async def finish_exporting(self): | ||
await DeferredList(self._pending_deferreds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section covers a generic concept like implementing custom item exporters, but focus exclusively in one type of such exporters. There could be others Custom Item Exporters, that are not aimed to be used as Streams. For example, to export into ASN.1 encoding.
I recommend to narrow this section and explain why you may want to do something like this: to export items into Streamed services like APIs or Databases rather tan dumping them into files.