Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RayElastic scale-up test fails #3197

Closed
tgaddair opened this issue Oct 5, 2021 · 0 comments · Fixed by #3205
Closed

RayElastic scale-up test fails #3197

tgaddair opened this issue Oct 5, 2021 · 0 comments · Fixed by #3205
Labels

Comments

@tgaddair
Copy link
Collaborator

tgaddair commented Oct 5, 2021

Followup from #2813, the ray elastic scale-up test is failing in Buildkite as well. We should investigate thus as part of #3190.

/usr/local/lib/python3.8/dist-packages/horovod/ray/elastic.py:454: RuntimeError
--
  | ------------------------------ Captured log call -------------------------------
  | ERROR    root:registration.py:179 failed to activate new hosts -> stop running
  | Traceback (most recent call last):
  | File "/usr/local/lib/python3.8/dist-packages/horovod/runner/elastic/registration.py", line 177, in _on_workers_recorded
  | self._driver.resume()
  | File "/usr/local/lib/python3.8/dist-packages/horovod/runner/elastic/driver.py", line 99, in resume
  | self._activate_workers(self._min_np)
  | File "/usr/local/lib/python3.8/dist-packages/horovod/runner/elastic/driver.py", line 177, in _activate_workers
  | pending_slots = self._update_host_assignments(current_hosts)
  | File "/usr/local/lib/python3.8/dist-packages/horovod/runner/elastic/driver.py", line 248, in _update_host_assignments
  | raise RuntimeError('No hosts from previous set remaining, unable to broadcast state.')
  | RuntimeError: No hosts from previous set remaining, unable to broadcast state.


https://buildkite.com/horovod/horovod/builds/6525#f16bee64-0b80-4cf8-9ba7-89b2d6aebde7/6-9500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

1 participant