Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubSubWorkerThread will sometimes not stop #1194

Closed
trulede opened this issue Jul 18, 2019 · 10 comments
Closed

PubSubWorkerThread will sometimes not stop #1194

trulede opened this issue Jul 18, 2019 · 10 comments

Comments

@trulede
Copy link
Contributor

trulede commented Jul 18, 2019

Version:
5.0 and latest (3.2) version of redis-py

Platform:
Linux

Description:
I've noticed that at times that a PubSubWorkerThread cannot be stopped. No matter how many times I call the stop() method, the _running variable is never set to False. Its not even possible to set that variable directly. I believe this the be caused as a side-effect of the Python Threading implementation.

The solution (which I will open a PR for) is to use an Event(), or some other mechanism which works correctly with Python Threads.

@andymccurdy
Copy link
Contributor

What version of Python are you experiencing this behavior with? I'd really like to better understand what's causing stop() to not work. I've used this pattern in threaded Python apps for a long time and have never encountered an issue with it. Can you consistently reproduce this issue?

I'm also surprised that you can't manipulate the _running variable directly. I have no issues with that as seen here:

import redis
r = redis.Redis()
p = r.pubsub()
p.subscribe(foo=lambda m: print(m))
t = p.run_in_thread()

t._running
>>> True
t.is_alive()
>>> True
t._running = False
t._running
>>> False
t.is_alive()
>>> False

@trulede
Copy link
Contributor Author

trulede commented Jul 18, 2019

Python 3.5 and 3.7

It happens about 1 in 20, however, because I use a lot of processes (each using a PubSub object), its more noticeable to me. There may be other factors too.

According to the documentation, and from my own experience, you need to use suitable methods to get reliable operation of Python Threads. Its not that setting _running to False does not work, most of the time, but rather it does not work all of the time. Its quite bizarre to explicitly set _running to False and it remains True!

https://docs.python.org/3/library/threading.html

If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.

That's what I do for both Threads and Processes, an Event(), its a method which works.

I'm fairly certain you could make _running as an Event() object, and then have stop() call set() on it, and that would work. If you want I can try it tomorrow. There is however some benefit to using an external Event() object, if you need to stop a lot of stuff in one go.

@andymccurdy
Copy link
Contributor

Looking at the actual implementation of threading.Event, it's simply managing a boolean flag, exactly what we're already doing with _running. The only difference is that the set() method is protected with a lock, which makes sense for Event objects that need to allow other threads to wait() on that flag being tripped.

We don't have that requirement in the PubSubWorkerThread and I see no reason why adding a lock around stop() would change anything, which is the only thing that swapping _running = False to self.stop_event.set() would do.

I'm not against making this change if it's required, I'm just failing to see why it would make any difference in this case.

If we do make this change, I would prefer an implementation similar to your last paragraph -- simply replacing the boolean _running with an Event object and have stop() call the event's set() method. I'd be really interested to see if you still experience this issue in your setup with this change.

@trulede
Copy link
Contributor Author

trulede commented Jul 18, 2019

I've refined the PR, and will test it tomorrow ... but I expect it to work. Probably this revision is better, at least for most users. I think Event() will work for older versions of Python.

As to what causes it .... I would guess somewhere in Threading module around _after_fork() ... but the answer in any case would be to use an Event() (or Queue or Barrier etc). Those things are existing for a reason, it is Python after all, they never do anything unless they have too!

Thanks for reviewing that so far.

@andymccurdy
Copy link
Contributor

Great thanks. Looking forward to seeing the results from your testing.

@andymccurdy
Copy link
Contributor

Have you tested the latest patch that subs out the boolean for an Event in your system yet? Did it eliminate the errors you were seeing?

@trulede
Copy link
Contributor Author

trulede commented Jul 24, 2019

Hi,

Yes ... I made some revisions and tested them. Now the code is close to the original, with only the Bool swapped for an Event. I think its better that way - rather than adding more options to the API as I had originally suggested.

The revisions are in the PR #1195

@andymccurdy
Copy link
Contributor

Have you tested these changes in the environment where you originally saw these errors? Was the issue resolved? And if so, did you also make any changes to your application regarding how it interfaces with redis-py?

Given that I've never seen this error happen and I don't have any way of reproducing the error, I want to be absolutely certain that this patch actually fixes the problem.

@trulede
Copy link
Contributor Author

trulede commented Jul 24, 2019

I'm absolutely certain that using the Event() improves the situation, and I did not observe the problem after making the changes. There were no other changes made in my code with regards to how it interfaces with redis-py (what I had been doing was killing the PubSub thread if it did not join after 5 seconds from setting the bool).

Furthermore, I've been unfortunate to have these kinds of problems before with Python Threads. Python makes it too easy ... but eventually you find yourself in a funny situation where once in a while things just don't work and threads don't stop. The idea that you can just set variables between threads is wrong, but since it works most of the time, I guess its OK most of the time too :-)

@andymccurdy
Copy link
Contributor

Thanks! Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants