Stuck at learning_process.initialize() in DP Tutorial #3756

SamuelGong · 2023-03-21T09:53:35Z

Describe the bug
In the colab notebook for DP, everything went well until I reached the code block where the program kept running for hours but prompting no log. Further debugging shows that the program never finished executing the line state = learning_process.initialize().

Environment:
The experiment is conducted from scratch using today's TFF (0.51.0). No modification has been made to any part of the notebook.

Expected behavior
The execution of the mentioned line should be able to complete, in an acceptable time like at most minutes.

The text was updated successfully, but these errors were encountered:

ZacharyGarrett · 2023-03-21T14:31:47Z

Is this a duplicate of #3742?

Please try the new 0.52.0 release (https://github.com/tensorflow/federated/releases/tag/v0.52.0), which was released yesterday to PyPi (https://pypi.org/project/tensorflow-federated/0.52.0/).

SamuelGong · 2023-03-21T16:48:01Z

Sorry 0.51.0 is a typo--in fact, I was using 0.52.0 (so I have emphasized that it was the version released today). Could you please re-investigate that? I have just escaped from #3742 but am now trapped in a new one.

zcharles8 · 2023-03-21T17:56:42Z

To clarify - You can execute code, but the learning_process.initialize() is hanging indefinitely? Do you have any estimate of how long it has run?

SamuelGong · 2023-03-22T03:05:46Z

Sure, it was the case. At least three to four hours, and then I lost patience with that. I have tried three times, each of which hung in the same place and no error message was prompted so I could not provide more information.

zcharles8 · 2023-03-22T23:03:31Z

@SamuelGong I think that if you remove the call to tff.backends.native.set_sync_local_cpp_execution_context it should run. This call should now be mainly unnecessary (as it is the default execution context) though it isn't clear why re-setting causes the hang. Can you see if that changes things?

SamuelGong · 2023-03-23T04:11:06Z

It works for me! Thank you very much.

SamuelGong · 2023-03-29T16:58:21Z

@SamuelGong I think that if you remove the call to tff.backends.native.set_sync_local_cpp_execution_context it should run. This call should now be mainly unnecessary (as it is the default execution context) though it isn't clear why re-setting causes the hang. Can you see if that changes things?

Since I can now run the tutorial notebook on my local machine, I have access to the jupyter notebook's log. Inspecting on the log, I found that when calling the function tff.backends.native.set_sync_local_cpp_execution_context(), errors like ERROR: Illegal value '3383.0' specified for flag 'max_concurrent_computation_calls' will be prompted in the log. It seems that the expected max_concurrent_computation_calls should be an integer, while the code in the tutorial does not ensure this. I am here to reopen this issue just in case you still not catch the bug.

zcharles8 · 2023-03-29T21:49:09Z

I think that tff.backends.native.set_sync_local_cpp_execution_context shouldn't be invoked in the tutorial at all, since it's now the default. As for the illegal value, this might be due to using Jupyter - I don't think we have any idea about whether it works with TFF or not (and would generally recommend colab instead).

deepquantum88 · 2023-04-08T19:45:01Z

@SamuelGong I stuck with the same hang issue when i execute state = learning_process.initialize()
no error message but execution hang.

even i removed tff.backends.native.set_sync_local_cpp_execution_context
but still it did not work.
TFF version 0.52.0 and Tf 2.11.0
on my local system

Can you please help? how this can be solved

SamuelGong · 2023-04-11T02:50:50Z

@SamuelGong I stuck with the same hang issue when i execute state = learning_process.initialize() no error message but execution hang.

even i removed tff.backends.native.set_sync_local_cpp_execution_context but still it did not work. TFF version 0.52.0 and Tf 2.11.0 on my local system

Can you please help? how this can be solved

Hi. For me, previously it was solved by removing the line. However, as TFF is undergoing rapid version change, it may not work now. If not, maybe you should resort to the team.

niharikagupta2021 · 2024-03-27T00:37:28Z

I'm facing the same issue when I use tensoflow federated in google colab. When I try to run tff.federated_computation(lambda: 'Hello, World!')(), this command is also hanging. The same happens with .initialize() function when i try to start training my model using tff.learning.algorithms.build_weighted_fed_avg. Has anyone faced this issue recently?

zcharles8 · 2024-03-27T01:32:29Z

@niharikagupta2021 I would encourage you to open a separate github issue for this. Please make sure to include the suggested details - things like version, operating system, etc. are critical to debugging this kind of thing.

SamuelGong added the bug Something isn't working label Mar 21, 2023

ZacharyGarrett self-assigned this Mar 21, 2023

ZacharyGarrett assigned zcharles8 and unassigned ZacharyGarrett Mar 22, 2023

SamuelGong closed this as completed Mar 23, 2023

SamuelGong reopened this Mar 29, 2023

zcharles8 mentioned this issue Apr 6, 2023

tff differential privacy model stucks in learning process #3822

Closed

zcharles8 closed this as completed Mar 27, 2024

zcharles8 reopened this Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck at learning_process.initialize() in DP Tutorial #3756

Stuck at learning_process.initialize() in DP Tutorial #3756

SamuelGong commented Mar 21, 2023

ZacharyGarrett commented Mar 21, 2023

SamuelGong commented Mar 21, 2023

zcharles8 commented Mar 21, 2023

SamuelGong commented Mar 22, 2023

zcharles8 commented Mar 22, 2023

SamuelGong commented Mar 23, 2023

SamuelGong commented Mar 29, 2023

zcharles8 commented Mar 29, 2023

deepquantum88 commented Apr 8, 2023 •

edited

SamuelGong commented Apr 11, 2023

niharikagupta2021 commented Mar 27, 2024 •

edited

zcharles8 commented Mar 27, 2024

Stuck at learning_process.initialize() in DP Tutorial #3756

Stuck at learning_process.initialize() in DP Tutorial #3756

Comments

SamuelGong commented Mar 21, 2023

ZacharyGarrett commented Mar 21, 2023

SamuelGong commented Mar 21, 2023

zcharles8 commented Mar 21, 2023

SamuelGong commented Mar 22, 2023

zcharles8 commented Mar 22, 2023

SamuelGong commented Mar 23, 2023

SamuelGong commented Mar 29, 2023

zcharles8 commented Mar 29, 2023

deepquantum88 commented Apr 8, 2023 • edited

SamuelGong commented Apr 11, 2023

niharikagupta2021 commented Mar 27, 2024 • edited

zcharles8 commented Mar 27, 2024

deepquantum88 commented Apr 8, 2023 •

edited

niharikagupta2021 commented Mar 27, 2024 •

edited