You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked the commit log
to find out if the bug was already fixed in the main branch.
I have included all related issues and possible duplicate issues in this issue
(If there are none, check this box anyway).
Related Issues and Possible Duplicates
Related Issues
Continuous memory leak #4843 - inappropriate usage of eta and countdown can lead to increasing worker RAM usage which may be mistakenly recognized as a memory leak
eta and countdown parameters can be easily overused, especially when using Redis as the broker.
When a task with those parameters is sent to the queue, the worker grabs that task immediately but doesn't actually execute it until eta/countdown conditions pass. Until that, the task is stored in memory. Obviously, one such task won't cause a problem, but in our production environment, we ended up in a situation where hundreds of thousands were accumulated in the worker, causing RAM usage to increase by 30GB over a period of a few days. This behavior is documented very deeply in the documentation, so I guess not many people have reached that page.
The situation can get even worse because of the visibility_timeout setting (1 hour by default), which causes those tasks to be redelivered. If the eta/countdown points to a distant future, tasks will be redelivered every visibility_timeout period.
The documentation suggests increasing the visibility_timeout to match the longest possible ETA in the application:
So you have to increase the visibility timeout to match the time of the longest ETA you’re planning to use
IMO, that's not a good idea as it can have a negative effect on reliability. On worker's failure, dropped tasks won't get redelivered until visibility_timeout passes. If set to some high value, it may impact user experience (e.g. email is sent after x days when its content is no longer relevant, some data important for the user is processed with x days of delay, etc.).
Suggestions
I'd suggest strongly discouraging using eta/countdown parameters. In my opinion, they should be only used with very low values (seconds, minutes). Instead, alternatives could be suggested, e.g. database-backed celery beat.
In Caveats I'd not recommend increasing visibility_timeout as a feasible solution, but if the task needs a longer ETA, I'd point to the alternatives mentioned above.
Optionally, it would be nice to add a warning section with all the implications I've mentioned above.
Hey @norbertcyran 👋,
Thank you for opening an issue. We will get back to you as soon as we can.
Also, check out our Open Collective and consider backing us - every little helps!
We also offer priority support for our sponsors.
If you require immediate assistance please consider sponsoring us.
Checklist
for similar or identical bug reports.
for existing proposed fixes.
to find out if the bug was already fixed in the main branch.
(If there are none, check this box anyway).
Related Issues and Possible Duplicates
Related Issues
eta
andcountdown
can lead to increasing worker RAM usage which may be mistakenly recognized as a memory leakPossible Duplicates
Description
eta
andcountdown
parameters can be easily overused, especially when using Redis as the broker.When a task with those parameters is sent to the queue, the worker grabs that task immediately but doesn't actually execute it until
eta
/countdown
conditions pass. Until that, the task is stored in memory. Obviously, one such task won't cause a problem, but in our production environment, we ended up in a situation where hundreds of thousands were accumulated in the worker, causing RAM usage to increase by 30GB over a period of a few days. This behavior is documented very deeply in the documentation, so I guess not many people have reached that page.The situation can get even worse because of the
visibility_timeout
setting (1 hour by default), which causes those tasks to be redelivered. If theeta
/countdown
points to a distant future, tasks will be redelivered everyvisibility_timeout
period.The documentation suggests increasing the
visibility_timeout
to match the longest possible ETA in the application:IMO, that's not a good idea as it can have a negative effect on reliability. On worker's failure, dropped tasks won't get redelivered until
visibility_timeout
passes. If set to some high value, it may impact user experience (e.g. email is sent after x days when its content is no longer relevant, some data important for the user is processed with x days of delay, etc.).Suggestions
I'd suggest strongly discouraging using
eta
/countdown
parameters. In my opinion, they should be only used with very low values (seconds, minutes). Instead, alternatives could be suggested, e.g. database-backed celery beat.In Caveats I'd not recommend increasing
visibility_timeout
as a feasible solution, but if the task needs a longer ETA, I'd point to the alternatives mentioned above.Optionally, it would be nice to add a warning section with all the implications I've mentioned above.
Lots of credit to the author of that article: https://engineering.instawork.com/celery-eta-tasks-demystified-424b836e4e94 who explained the problem perfectly
The text was updated successfully, but these errors were encountered: