Helpers for `bulk` method such as `async_bulk` sleep in blocking manner, preventing graceful shutdown #2484

artem-shelkovnikov · 2024-03-22T14:22:56Z

We tried using async_bulk and async_streaming_bulk helpers to ingest data into Elasticsearch and they work great, but we've found out that they prevent our code from gracefully shutting down when CTRL+C is pressed.

Example code that sleeps: https://github.com/elastic/elasticsearch-py/blob/main/elasticsearch/_async/helpers.py#L241

It would be great to have a way to:

Either define how the sleep happens by passing sleep function into the client
Make Elasticsearch client internally cancel all sleeps when the client is closed

The text was updated successfully, but these errors were encountered:

artem-shelkovnikov · 2024-03-22T14:27:32Z

Example code that our product calls when using the client: https://github.com/elastic/connectors/blob/main/connectors/es/sink.py (see comment on top of the file on how we collect and ingest data).

In short, we have a SyncOrchestrator class that internally creates two classes:

Extractor. This class is responsible to provide a generator that will return documents from 3-rd party system and put it into the MemQueue
Sink. This class is responsible to pick up the data from the MemQueue and send it in batches to Elasticsearch. Right now it just sends it with regular bulk request: https://github.com/elastic/connectors/blob/main/connectors/es/sink.py#L149, but ideally we'd love to switch to a helper from the python client.
MemQueue itself is there to provide backpressure, limiting the number of items that can in the queue AND total size of items that are in the queue - this way we can to some extent control memory usage of the framework

pquentin · 2024-05-17T06:02:32Z

Sorry for the delay Artem. I would be happy to implement the first version, allowing the sleep function to be user-defined. Silently cancelling all sleeps/bulks isn't something we'd want in the general case.

artem-shelkovnikov mentioned this issue Mar 22, 2024

retry_on_status setting does not work as expected with requests that should not be retried immediately #2485

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helpers for `bulk` method such as `async_bulk` sleep in blocking manner, preventing graceful shutdown #2484

Helpers for `bulk` method such as `async_bulk` sleep in blocking manner, preventing graceful shutdown #2484

artem-shelkovnikov commented Mar 22, 2024

artem-shelkovnikov commented Mar 22, 2024

pquentin commented May 17, 2024 •

edited

Helpers for bulk method such as async_bulk sleep in blocking manner, preventing graceful shutdown #2484

Helpers for bulk method such as async_bulk sleep in blocking manner, preventing graceful shutdown #2484

Comments

artem-shelkovnikov commented Mar 22, 2024

artem-shelkovnikov commented Mar 22, 2024

pquentin commented May 17, 2024 • edited

Helpers for `bulk` method such as `async_bulk` sleep in blocking manner, preventing graceful shutdown #2484

Helpers for `bulk` method such as `async_bulk` sleep in blocking manner, preventing graceful shutdown #2484

pquentin commented May 17, 2024 •

edited