You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I referred to the command line here.
The results are below.
// python script in scheduler
from dask.distributed import Client, SSHCluster
def task(name):
print(f'task-{name}')
return f'task-{name}'
if __name__ == '__main__':
print("- Distributed scheduler:", end='')
with SSHCluster(['localhost', '123.456.78.911'],
connect_options=[{"known_hosts": None, 'password': 'vfroot'},
{'known_hosts': None, 'password': 'vfroot', 'port': 20022}],
) as cluster, Client(cluster) as client:
print(client)
- Distributed scheduler:distributed.deploy.ssh - INFO - distributed.scheduler - INFO - -----------------------------------------------
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - -----------------------------------------------
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - Clear task state
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - Scheduler at: tcp://172.17.0.2:8786
distributed.deploy.ssh - INFO - Usage: dask_worker.py [OPTIONS] [SCHEDULER] [PRELOAD_ARGV]...
Task exception was never retrieved
future: <Task finished name='Task-16' coro=<_wrap_awaitable() done, defined at /opt/conda/envs/rapids/lib/python3.8/asyncio/tasks.py:688> exception=Exception('Worker failed to start')>
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/spec.py", line 67, in _
await self.start()
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/ssh.py", line 130, in start
raise Exception("Worker failed to start")
Exception: Worker failed to start
distributed.deploy.ssh - INFO - Usage: dask_worker.py [OPTIONS] [SCHEDULER] [PRELOAD_ARGV]...
Traceback (most recent call last):
File "/root/project/parallel_example/SSHClient.py", line 14, in <module>
with SSHCluster(['localhost', '123.456.78.911'],
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/ssh.py", line 368, in SSHCluster
return SpecCluster(workers, scheduler, name="SSHCluster", **kwargs)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/spec.py", line 284, in __init__
self.sync(self._correct_state)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 214, in sync
return sync(self.loop, func, *args, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/utils.py", line 326, in sync
raise exc.with_traceback(tb)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/utils.py", line 309, in f
result[0] = yield future
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/spec.py", line 371, in _correct_state_internal
await w # for tornado gen.coroutine support
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/spec.py", line 67, in _
await self.start()
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/ssh.py", line 130, in start
raise Exception("Worker failed to start")
Exception: Worker failed to start
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/spec.py", line 671, in close_clusters
cluster.close(timeout=10)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 135, in close
return self.sync(self._close, callback_timeout=timeout)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 214, in sync
return sync(self.loop, func, *args, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/utils.py", line 326, in sync
raise exc.with_traceback(tb)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/utils.py", line 309, in f
result[0] = yield future
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/distributed/deploy/spec.py", line 437, in _close
assert w.status == Status.closed, w.status
AssertionError: Status.created
Process finished with exit code 1
What's the right way to establish the connection between scheduler and worker?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I want to cluster two containers at nodes in different networks. (IP is written randomly)
Container 1 (Scheduler)
ssh: 123.456.78.910:20022
scheduler-port: 123.456.78.910:28786
Container 2 (Worker)
ssh: 123.456.78.911:20022
worker-port: 123.456.78.911:28786
I referred to the command line here.
The results are below.
What's the right way to establish the connection between scheduler and worker?
Beta Was this translation helpful? Give feedback.
All reactions