Fix deadlock in vdf_client #97

xearl4 · 2021-12-10T23:55:44Z

Add missing logic to handle stopped signal in one-weso proofs.

For one-weso proofs, vdf_client launches two threads to do its work. One
repeatedly squares until a given number of iterations, the second waits
until the first is done squaring and then computes a proof.

The squaring-thread properly handles the "stopped" signal and aborts
early, if signaled. That way, it never reaches the targeted iterations.
The 1weso-thread waits until the target iterations are reached, but does
not handle the "stopped" signal. Thus, for stopped iters, it waits
infinitely.

This situation regularly occurs, vdf_client is running for a bluebox
timelord. Whenever the timelord (Python) process restarts, network
communication errors out, killing the squaring thread. But instead of
the while vdf_client exiting (which would lead to the timelord launcher
cleanly restarting a fresh one), the 1weso thread keeps the vdf_client
alive infinitely.

The fix itself is just as it's done for 2weso proofs and elsewhere. The
1weso caller already handles stopped correctly, so 1weso can just
return a default-constructed proof.

wjblanke · 2021-12-10T23:56:47Z

thanks! florin can u take a look

Add missing logic to handle stopped signal in one-weso proofs. For one-weso proofs, vdf_client launches two threads to do its work. One repeatedly squares until a given number of iterations, the second waits until the first is done squaring and then computes a proof. The squaring-thread properly handles the "stopped" signal and aborts early, if signaled. That way, it never reaches the targeted iterations. The 1weso-thread waits until the target iterations are reached, but does not handle the "stopped" signal. Thus, for stopped iters, it waits infinitely. This situation regularly occurs, vdf_client is running for a bluebox timelord. Whenever the timelord (Python) process restarts, network communication errors out, killing the squaring thread. But instead of the while vdf_client exiting (which would lead to the timelord launcher cleanly restarting a fresh one), the 1weso thread keeps the vdf_client alive infinitely.

fchirica

thanks!

github-actions · 2022-02-10T11:03:58Z

'This PR has been flagged as stale due to no activity for over 60
days. It will not be automatically closed, but it has been given
a stale-pr label and should be manually reviewed.'

xearl4 · 2022-02-10T22:28:08Z

Any chance at merging this so that it gets in the upcoming release?

hoffmang9 · 2022-02-10T22:43:41Z

It is not likely to get into today's beta. It may get into 1.3 actual. @wjblanke ?

wjblanke · 2022-03-28T19:19:14Z

Thanks again

wjblanke requested a review from fchirica December 10, 2021 23:56

xearl4 force-pushed the fix-vdf_client-deadlock branch from 572d8f0 to a96fcc8 Compare December 10, 2021 23:57

xearl4 force-pushed the fix-vdf_client-deadlock branch from a96fcc8 to 9d1bc13 Compare December 10, 2021 23:58

fchirica approved these changes Dec 11, 2021

View reviewed changes

github-actions bot added the stale-pr label Feb 10, 2022

github-actions bot removed the stale-pr label Feb 11, 2022

wjblanke merged commit cb47a9b into Chia-Network:main Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock in vdf_client #97

Fix deadlock in vdf_client #97

xearl4 commented Dec 10, 2021 •

edited

Loading

wjblanke commented Dec 10, 2021

fchirica left a comment

github-actions bot commented Feb 10, 2022

xearl4 commented Feb 10, 2022

hoffmang9 commented Feb 10, 2022

wjblanke commented Mar 28, 2022

Fix deadlock in vdf_client #97

Fix deadlock in vdf_client #97

Conversation

xearl4 commented Dec 10, 2021 • edited Loading

wjblanke commented Dec 10, 2021

fchirica left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 10, 2022

xearl4 commented Feb 10, 2022

hoffmang9 commented Feb 10, 2022

wjblanke commented Mar 28, 2022

xearl4 commented Dec 10, 2021 •

edited

Loading