Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 20.12.x possible broken threading.Lock #1735

Closed
theyosh opened this issue Dec 27, 2020 · 9 comments · Fixed by #1746
Closed

Release 20.12.x possible broken threading.Lock #1735

theyosh opened this issue Dec 27, 2020 · 9 comments · Fixed by #1746
Labels
Type: Question User support and/or waiting for responses

Comments

@theyosh
Copy link

theyosh commented Dec 27, 2020

  • gevent version: Pypi version 20.12.x
  • Python version: Python 3.7.3 installed via deb packages.
  • Operating System: Raspberry Pi OS (latest version) Linux 5.4.79-v7l+ Build and test with PyPy 7.1 #1373 SMP Mon Nov 23 13:27:40 GMT 2020 armv7l GNU/Linux
  • Device: Raspberry PI version 4

Description:

Upgrading from gevent 20.9.0 to 20.12.x with pip install -U command will break libraries that are using threading.Lock

I found this issue after #1733. And there is not an error in my code. But an external library that is using threading.Lock will not work anymore.

For example: https://github.com/waveform80/picamera/blob/release-1.13/picamera/camera.py#L1421 error is triggered when I use gevent 20.12.x and not when using 20.9.0 And I am not sure, but in the release of 20.12.x there was changes in the threading.lock code. And at this line, it is being used: https://github.com/waveform80/picamera/blob/release-1.13/picamera/camera.py#L362

So, I do not know how to describe the bug. As it is harder to debug external libraries. But a gevent upgrade will break the picamera library.

How to proceed?

@jamadden
Copy link
Member

I'm going to need more to go on than "it doesn't work anymore". How doesn't it work? What behaviour changed?

I won't say that there are no issues, but all of the standard library tests for locks pass with the current version, as do all of gevent's own tests. So we're going to need much more information about what behoviour you're seeing versus what you expect to see.

@jamadden jamadden added the Type: Question User support and/or waiting for responses label Dec 28, 2020
@theyosh
Copy link
Author

theyosh commented Dec 28, 2020

@jamadden I totally agree, but I do not know how to give more information. So, maybe this will help?

I made a small example program:

from gevent import monkey
monkey.patch_all()

from time import sleep
from picamera import PiCamera

def main():
  # Example from: https://picamera.readthedocs.io/en/release-1.13/recipes1.html#capturing-to-a-file
  camera = PiCamera()
  print('Camera object loaded')
  camera.resolution = (1024, 768)
  print('Resolution set')
  camera.start_preview()
  print('Warm up for 2 sec')
  # Camera warm-up time
  sleep(2)
  print('Capture image to disk')
  camera.capture('test.jpg')
  print('DONE!')

if __name__ == '__main__':
    main()

This example is with monkey.patch_all(). If I comment that line out, so no monkey patching it get the following output:

time python3 broken_picamera.py 
Camera object loaded
Resolution set
Warm up for 2 sec
Capture image to disk
DONE!

real	0m2.888s
user	0m0.290s
sys	0m0.079s

And I have a picture called test.jpg with a frame/image from the camera.

When I then enable monkey patching, I get this:

time python3 broken_picamera.py 
Camera object loaded
Resolution set
Warm up for 2 sec
Capture image to disk
Traceback (most recent call last):
  File "broken_picamera.py", line 22, in <module>
    main()
  File "broken_picamera.py", line 18, in main
    camera.capture('test.jpg')
  File "/home/pi/venv/lib/python3.7/site-packages/picamera/camera.py", line 1423, in capture
    'Timed out waiting for capture to end')
picamera.exc.PiCameraRuntimeError: Timed out waiting for capture to end

real	1m3.387s
user	0m0.991s
sys	0m0.159s

It stops after 60 seconds, which is the max capture time of the picamera library. The line DONE! is never shown, and therefore the capture times out.

So after some searching in the code, I found that it more looks like an threading.event issue. Because as far I can see the code that here the error is triggered https://github.com/waveform80/picamera/blob/release-1.13/picamera/camera.py#L1421 . And that is done on the encoder object during a wait command: https://github.com/waveform80/picamera/blob/release-1.13/picamera/encoders.py#L382 . And there I found that it uses the threading.event object https://github.com/waveform80/picamera/blob/release-1.13/picamera/encoders.py#L181

This is how far I get. When I run the testing camera code with Gevent 20.9.0 it all works. When I upgrade it to 20.12.1 it breaks. This was also the case I think in 20.12.0. I am trying to find as much as possible to get this fixed. But the problem is that it is not my code that is having issues. It is an external library. Should we open an issue over there?

@jamadden
Copy link
Member

Is there any difference if you add from gevent import get_hub; get_hub() just after monkey.patch_all()?

@theyosh
Copy link
Author

theyosh commented Dec 30, 2020

Sorry for the slow responding. But this does not change a thing. Still the same behaviour. Timeout error after 60 seconds and no image from the camera.

@jamadden
Copy link
Member

I purchased a Raspberry pi camera to verify this, and I understand what's going on.

Under the covers, the MMAL library (implemented in C), uses low-level APIs that gevent cannot patch to create a new native thread. It's this native thread that tries to set the Event, meaning that gevent's Event object is being used from multiple threads at once.

That's never been supported. The only gevent synchronization object that can be used across threads is a Semaphore (and that's fairly recent).

Under gevent 20.9, it kind-of works, entirely by accident. The thread that sets the Event tells the main thread's hub to run the callback that will wake up the sleeping Event waiters. Because this is coming from another thread, the hub doesn't actually do anything with this; it just sits there. Eventually, the timer that is being used to bound how long the wait is expires, and that wakes the hub up; but because of internal details of the hub implementation, it actually first runs the code that wakes the event up (and then discards the expired timer). The timer is 60s, and the camera.capture() call winds up taking that full 60s (or more). This can be sped up by making that main hub wake up sooner, e.g., by using gevent.spawn_later(10, sleep).

Under gevent 20.12, the Event object actually detects that it's being used in an invalid fashion, across threads. As part of the implementation shared with Semaphore, it attempts to compensate for this, but, because it was never intended to work, well, it doesn't.

After #1739 (#1743), it once again behaves like it did in 20.9: it accidentally works, but it takes the full 60s (and can be sped up by forcing the main hub to wake up sooner).

@jgehrcke
Copy link
Contributor

jgehrcke commented Jan 15, 2021

I purchased a Raspberry pi camera to verify this, and I understand what's going on.

@jamadden your maintenance efforts for gevent continue to amaze me.

Thanks for another great write-up, too!

jamadden added a commit that referenced this issue Jan 15, 2021
This makes cross-thread wakeups of Event and AsyncResult objects much faster and doesn't rely on the target loop spinning as this wakes it up.

Add tests for this.

Fixes #1735
jamadden added a commit that referenced this issue Jan 15, 2021
This makes cross-thread wakeups of Event and AsyncResult objects much faster and doesn't rely on the target loop spinning as this wakes it up.

Add tests for this.

Fixes #1735
@jamadden
Copy link
Member

What I hope is a fix for this has been released in gevent 21.1.0.

@theyosh
Copy link
Author

theyosh commented Jan 17, 2021

Damm, that is a nice service :). So where can I drop some money for your Pi camera. That is the least I can do I think.

And the camera is indeed working again. Thanks a lot!

@jamadden
Copy link
Member

No worries, I'd been wanting one to play around with for awhile, so this was a convenient excuse :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Question User support and/or waiting for responses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants