Improve convert retry handling #433

stchris · 2023-02-10T15:29:23Z

Our current retry logic for converting documents (shelling out to LibreOffice) is based on two constants: the number of retry attempts and the timeout

ingest-file/ingestors/support/convert.py

Lines 16 to 17 in fca65fb

    
           TIMEOUT = 3600  # seconds 
        
           CONVERT_RETRIES = 5

What would be more desirable is a faster first fail which could be increased to a maximum.

For instance: right now we retry up to 5 times and timeout after 3600s (1 hour). We could potentially get much better throughput by having a first timeout after 600s (10 minutes) which gets progressively larger (with a potential max cap). To illustrate:

TIMEOUT_START=600
TIMEOUT_INCREASE=900
TIMEOUT_MAX=3600
CONVERT_RETRIES=5

This would result in up to 5 retries with timeouts of 10, 25, 40, 55 and 60 minutes. Ideally "stuck" convert tasks would time out much sooner and get queued up for a retry faster.

TODO

try get some data on average(and maximum?) time it takes to convert a document
make the timeout and retry settings respect their respective settings.

ozhyrenkov · 2023-09-15T07:56:38Z

Hey, have a thought regarding START/INCREASE/MAX variables. I do like the way it works in retry and requests libraries - it has a backoff parameter as a float number which indicates speed of growth of interval between attempts.

I am not really deeply into the way it works in Aleph, however Retry lib has this implemented in a nice manner.

Rosencrantz added the improvement label Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve convert retry handling #433

Improve convert retry handling #433

stchris commented Feb 10, 2023 •

edited

ozhyrenkov commented Sep 15, 2023

Improve convert retry handling #433

Improve convert retry handling #433

Comments

stchris commented Feb 10, 2023 • edited

ozhyrenkov commented Sep 15, 2023

stchris commented Feb 10, 2023 •

edited