Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock in vdf_client #97

Merged
merged 1 commit into from Mar 28, 2022

Conversation

xearl4
Copy link
Contributor

@xearl4 xearl4 commented Dec 10, 2021

Add missing logic to handle stopped signal in one-weso proofs.

For one-weso proofs, vdf_client launches two threads to do its work. One
repeatedly squares until a given number of iterations, the second waits
until the first is done squaring and then computes a proof.

The squaring-thread properly handles the "stopped" signal and aborts
early, if signaled. That way, it never reaches the targeted iterations.
The 1weso-thread waits until the target iterations are reached, but does
not handle the "stopped" signal. Thus, for stopped iters, it waits
infinitely.

This situation regularly occurs, vdf_client is running for a bluebox
timelord. Whenever the timelord (Python) process restarts, network
communication errors out, killing the squaring thread. But instead of
the while vdf_client exiting (which would lead to the timelord launcher
cleanly restarting a fresh one), the 1weso thread keeps the vdf_client
alive infinitely.

The fix itself is just as it's done for 2weso proofs and elsewhere. The
1weso caller already handles stopped correctly, so 1weso can just
return a default-constructed proof.

@wjblanke
Copy link
Contributor

thanks! florin can u take a look

Add missing logic to handle stopped signal in one-weso proofs.

For one-weso proofs, vdf_client launches two threads to do its work. One
repeatedly squares until a given number of iterations, the second waits
until the first is done squaring and then computes a proof.

The squaring-thread properly handles the "stopped" signal and aborts
early, if signaled. That way, it never reaches the targeted iterations.
The 1weso-thread waits until the target iterations are reached, but does
not handle the "stopped" signal. Thus, for stopped iters, it waits
infinitely.

This situation regularly occurs, vdf_client is running for a bluebox
timelord. Whenever the timelord (Python) process restarts, network
communication errors out, killing the squaring thread. But instead of
the while vdf_client exiting (which would lead to the timelord launcher
cleanly restarting a fresh one), the 1weso thread keeps the vdf_client
alive infinitely.
Copy link
Contributor

@fchirica fchirica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@github-actions
Copy link

'This PR has been flagged as stale due to no activity for over 60
days. It will not be automatically closed, but it has been given
a stale-pr label and should be manually reviewed.'

@xearl4
Copy link
Contributor Author

xearl4 commented Feb 10, 2022

Any chance at merging this so that it gets in the upcoming release?

@hoffmang9
Copy link
Member

It is not likely to get into today's beta. It may get into 1.3 actual. @wjblanke ?

@github-actions github-actions bot removed the stale-pr label Feb 11, 2022
@wjblanke
Copy link
Contributor

Thanks again

@wjblanke wjblanke merged commit cb47a9b into Chia-Network:main Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants