New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested canvas - fetching result hangs #5060
Comments
Additional info: the observed hang of |
It seems to have to do with the total number of tasks in the canvas. I simplified the canvas to: res = group(
tasks.generateTile.s(tileinfo) for tileinfo in tileinfos
).apply_async() and I observe the same hang when I have more than 28 tasks in the group (i.e., more than 28 For the sake of completeness, this is the definition of the task: @app.task(name = 'tasks.generateTile', bind=True, max_retries=3, acks_late=True, track_started=True)
def generateTile(self, tileinfo):
if 'error' in tileinfo:
return tileinfo
try:
logger.info('>>> generateTile {}'.format(tileinfo['tilespec']))
tilespec = tileinfo['tilespec']
# Complete success!
logger.info('<<< generateTile {}'.format(tileinfo['tilespec']))
return tileinfo
except subprocess.CalledProcessError as e:
logger.error(e.output)
try:
self.retry(countdown=2)
except MaxRetriesExceededError:
return { 'error': 'exception' }
except Exception as e:
logger.error(e)
try:
self.retry(countdown=2)
except MaxRetriesExceededError:
return { 'error': 'exception' } and this is how I set up the app: app = Celery('pipeline',
broker=os.environ['CELERY_BROKER_URL'],
backend=os.environ['CELERY_BACKEND_URL'],
include=['pipeline.tasks']) |
I think I'm seeing the same issue. I have a toy problem where I'm trying to generate a bunch of random numbers. I have a trivial task that just returns a single random number. My aim is to use My task is: @app.task
def random_weight():
return str(random.random()) The script I'm using to generate the numbers is: import celery
from weighter.tasks import random_weight
def main():
n = 11
weights = celery.group(random_weight.s() for _ in range(n))
print("The weights are: {}".format(weights.apply_async().get()))
if __name__ == "__main__":
main() Running the task with The logs show that all the tasks are succeeding (even for I am using RabbitMQ as my broker and Redis as my result backend. This problem does not occur when using RPC as the backend. I have not tried @greuff's workaround yet. Thanks for the detailed description, though! Versions:
|
I tried @greuff's workaround (periodically re-trying Additionally, it looks like the def main():
n = 100
weights = celery.group(random_weight.s() for _ in range(n))
res = weights.apply_async()
print("Successful: {}".format(res.successful()))
pipelineResult = res.get(timeout=600000)
print("The weights are: {}".format(pipelineResult))
Also, if I drop the Edit:
Never mind about this. I was just getting lucky with the race condition. The Because of the race condition, I'd highly recommend keeping the full |
I just switched to using memcached as my results backend. I haven't seen it hang yet. |
@Helumpago thanks for chiming in. We don't have a real solution yet, but the workaround with the loop seems to be stable. We haven't had the chance to try another backend yet. |
I'm meeting the same issue. res.ready() hangs |
The same, this hangs:
Repro: from celery import Celery
app = Celery(
'tasks',
broker='pyamqp://guest:guest@rabbitmq:5672',
backend='rpc://',
)
@app.task
def add(first: int, second: int) -> int:
print(first + second)
return first + second Setup: version: '3.7'
services:
rabbitmq:
image: 'rabbitmq:3.8-management-alpine'
restart: unless-stopped
worker:
build: .
command: celery --app=app:app worker
restart: unless-stopped And dockerfile: FROM python:3.8.6-slim-buster
LABEL maintainer="sobolevn@wemake.services"
LABEL vendor="wemake.services"
WORKDIR /code
RUN pip install 'celery==5.0.2'
# Copy source files:
COPY app.py /code/
|
Brief reading makes me think that the original issue might have been related to the inner group being upgraded to a chord and then perhaps an issue with chord size counting which we fixed in #6354 . But seeing it not stall for smaller sizes of the iterable driving the comprehension is a bit suspicious. I'd have to try some of these MRTCs on top of master to get an idea. I'm also a bit blown away by @sobolevn 's comment - the only thing I could think of there is the promise confusion we fixed in #6411 but that landed in 5.0.1 so for behaviour like that to present in such a trivial task is surprising. I'd be pretty confident in saying they're probably caused by different things, Edit: Weird, it looks like maybe linked tasks don't actually save their result to the RPC backend. With a redis broker/backend this works fine: import celery
app = celery.Celery(name="foo", broker="redis://", backend="redis://")
@app.task
def add(a, b):
return a + b >>> import foo
>>> s = foo.add.s(2, 2)
>>> s.link(foo.add.s(8))
foo.add(8)
>>> r = s.apply_async()
>>> r.get()
4
>>> r.children[0].get()
12
>>> r.children[0].id
'31111d85-0626-49ee-9574-f3aa978c0f06'
But the same thing with the RPC backend stalls on the child result object as @sobolevn describes |
I've not had any time to dig further into this. From memory and reading my previous comment, it does seem like this is still currently broken and should be looked into further. I'm going to unassign myself since I certainly won't have the time to do so over the next couple of months. Pinging @thedrow and @auvipy for redistribution if possible. |
no worries. we will revisit this after 5.1 release |
Hello, I'm not quite sure if this is a bug or if I'm missing something essential. We're trying to drive a non-trivial canvas and experience lockups when fetching the result now and then, and I'm trying to hunt down the cause. We're using Celery 4.2.1, and current Redis (from the
redis
Docker image) as both backend and broker.I stripped a whole lot of code and came up with this little test script. The tasks are stripped down to do nothing, they just return the input given to them (all of the tasks return results).
Now, when I fetch the result in the next instruction, the script hangs, although all celery tasks completed successfully, as confirmed with flower, celery event monitor, etc.:
The curious part now is, that when I do the following instead, the result is fetched and the script succeeds:
Note that sometimes more than one loop revolution is required until res.get eventually returns the result, even though all tasks are already completed! As if it was driving some kind of state machine and lags behind.
Now, even funnier, when I comment out the
print
statements that callwaiting()
etc on the GroupResult object, the script again loops forever and never finishes:Output:
I looked at the
redis-cli monitor
command, and there is a lot of subscribing/unsubscribing going on. I put the working version of this script in a loop and let it run 1000 times in a row - it completed successfully every time. I also confirmed withredis-cli pubsub channels
that the number of channels stays stable (at around 150).Also, when the
tileinfos
array only contains one item, the script doesn't hang at all.Now, I'm not sure if I did anything wrong, especially in constructing the canvas, or if there is a bug. I understand that we do a lot of nesting chains into groups into chains into groups, and I'm not sure if that's supported at all. I don't however understand why
res.get()
blocks forever, or why it seems to be important to callres.waiting
etc. before callingres.get
.The text was updated successfully, but these errors were encountered: