Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wooey and while True #198

Open
toert opened this issue Dec 14, 2017 · 15 comments
Open

Wooey and while True #198

toert opened this issue Dec 14, 2017 · 15 comments

Comments

@toert
Copy link

toert commented Dec 14, 2017

If I start an executing a script with "while True" and after some time I decide to stop it via Stop button then it will not stop and will work infinitely.
My solution of the problem is killing all Celery workers as processes.
Do you have any ideas how can I stop infinitely scripts via web interface, not CLI?

@Chris7
Copy link
Member

Chris7 commented Dec 19, 2017

Does your script have a blind try/except clause? If so, it may be swallowing the exception celery sends to stop a task.

@toert
Copy link
Author

toert commented Jan 1, 2018

It actually doesn't have any blind try/except clauses

@Chris7
Copy link
Member

Chris7 commented Jan 1, 2018

Can you provide the script? I just tested and celery successfully terminated:

[2018-01-01 23:24:57,945: INFO/MainProcess] Terminating 90cbb3e6-7026-45c9-b800-8e2dad93f2f1 (Signals.SIGKILL)
[2018-01-01 23:24:57,976: ERROR/MainProcess] Task wooey.tasks.submit_script[90cbb3e6-7026-45c9-b800-8e2dad93f2f1] raised unexpected: Terminated(9,)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/billiard/pool.py", line 1678, in _set_terminated
    raise Terminated(-(signum or 0))
billiard.exceptions.Terminated: 9

This is the script I tested with:

import argparse
import sys

parser = argparse.ArgumentParser(description="run forever!")

def main():
    while True:
        import time
        time.sleep(5)

if __name__ == "__main__":
    parser.parse_args()
    sys.exit(main())

@toert
Copy link
Author

toert commented Jan 2, 2018

import sys
import time
import argparse
import logging

import requests  # pip install requests


logging.basicConfig(level=logging.INFO)

parser = argparse.ArgumentParser()
parser.add_argument('--m', type=int)

args = parser.parse_args()


def main():
    print('Hello')
    logging.info('start')

    while True:
        print('Hello True')
        logging.info('i am still running')
        requests.get('http://127.0.0.1:8000')
        time.sleep(10)


if __name__ == "__main__":
    sys.exit(main())

And it sends requests even after clicking Stop button.

How I run celery:

#!/bin/sh
cd /opt
source venv/bin/activate
cd DO_wooey
python manage.py celery worker -c 5 --beat -l info

@Chris7
Copy link
Member

Chris7 commented Jan 22, 2018

Sorry it's taken me a bit to get to this. I just tried your script and the stop button worked.

celery_1  | [2018-01-22 15:29:53,680: INFO/MainProcess] Received task: wooey.tasks.submit_script[df427903-e036-44b9-84bd-192c4a670ed0]
celery_1  | [2018-01-22 15:30:07,053: INFO/MainProcess] Terminating df427903-e036-44b9-84bd-192c4a670ed0 (Signals.SIGKILL)
celery_1  | [2018-01-22 15:30:07,086: ERROR/MainProcess] Task wooey.tasks.submit_script[df427903-e036-44b9-84bd-192c4a670ed0] raised unexpected: Terminated(9,)
celery_1  | Traceback (most recent call last):
celery_1  |   File "/usr/local/lib/python3.6/site-packages/billiard/pool.py", line 1678, in _set_terminated
celery_1  |     raise Terminated(-(signum or 0))
celery_1  | billiard.exceptions.Terminated: 9

I would look at your installed dependencies.

@Chris7
Copy link
Member

Chris7 commented Jan 22, 2018

What version of python are you using and can you provide the output of pip freeze?

@toert
Copy link
Author

toert commented Jan 27, 2018

@Chris7
Copy link
Member

Chris7 commented Feb 18, 2018

I think I know the reason -- the default broker is SQL, which is useful for development/testing. However, the broadcast/control commands are not supported by this broker. When you run python manage.py celery inspect active, do you receive:Error: Broadcast not supported by SQL broker transport?

To fix this, you need to define a "real" broker like rabbit in your user_settings (BROKER_URL).

@toert
Copy link
Author

toert commented Feb 26, 2018

No, I use real broker.
(venv) toerting@ubuntu-1gb-lon1-01:/opt/DO_wooey$ python manage.py celery inspect active
-> celery@ubuntu-1gb-lon1-01: OK
- empty -

@Chris7
Copy link
Member

Chris7 commented Feb 26, 2018

Ok, to debug this I'll need a step by step to reproduce on my end from a clean setup.

@toert
Copy link
Author

toert commented May 16, 2018

I inspected Wooey's source code and found that jobs' processes are killed by SIGKILL -9. There is no way to block or to try to catch the signal.
Also after sending SIGKILL scripts stop immediately and then rerun. The reason of it is RabbitMQ Queued messages.
Wooey's stop job button do nothing with queued messages. However, if a process was stopped by internal conditions(exceptions, exitcode 0, etc) and a job status became 'Completed' then it disappears from RabbitMQ queue. Implementing 'Completed' status via Django admin doesn't delete message. Is that any way to purge queue after clicking stop job button?

@Chris7
Copy link
Member

Chris7 commented May 16, 2018

This is tricky. The stop behavior in celery is this:

When a worker receives a revoke request it will skip executing the task, but it won’t terminate an already executing task unless the terminate option is set.
If terminate is set the worker child process processing the task will be terminated. The default signal sent is TERM, but you can specify this using the signal argument. Signal can be the uppercase name of any signal defined in the signal module in the Python Standard Library.

There seems to be no good way to stop a stuck process that doesn't either nuke the entire worker or risk not actually working (a process can ignore SIGHUPs). I think a better solution would be to have a STOPPING state after a SIGHUP is sent, and then if a task is in STOPPING, have the Stop button change to Kill which will terminate the task/process.

Also after sending SIGKILL scripts stop immediately and then rerun.

I think the reason it is rerunning is because you have ACKS_LATE set to True. This means that a task is only taken off the queue after it is successful. One option is to disable ACKS_LATE and use the rerun command instead to selectively requeue work.

Is that any way to purge queue after clicking stop job button?
You can purge messages through celery (look at celery purge) or through rabbitmq's management page.

@Chris7
Copy link
Member

Chris7 commented May 20, 2018

@toert I take it this means you are able stop scripts now?

@toert
Copy link
Author

toert commented May 28, 2018

Actually not. Take a look at https://github.com/toert/DO_wooey .
As you can see ACKS_LATE isn't defined by me, also a default value is False.
And celery purge looks good, however I don't want to purge it manually every time 😀

@Chris7
Copy link
Member

Chris7 commented Jun 23, 2018

What OS are you using? I setup a Wooey server using that repository in python 3.6.5 and halting scripts worked as expected.

Also, you might want to upgrade the version of Wooey you are using to at least the latest in 0.9.x (if not 0.10.x, though 0.10.x has a few changes wrt celery that will require updating some of your settings)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants