Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm bridge mode requires to start idmtools-slurm-bridge from each login node(4 of total) in NU #1955

Open
shchen-idmod opened this issue Nov 19, 2022 · 1 comment

Comments

@shchen-idmod
Copy link
Collaborator

Currently NU user can random login to 4 different nodes. in order to get bridge mode work, we need to start idmtools-slurm-bridge util in each of nodes since they are not share the process between nodes.
But this create few issues:

  1. User does not know they need to run idmtools-slurm-bridge util on each login node
  2. Even they can start idmtools-slurm-bridge on each node, the slurm-bridge.pid file is stored in the same location for all nodes (default is ~/.idmtools/singularity-bridge/). This creates another issue when second time run idmtools-slurm-bridge in different node, it will ask you to delete existing slurm-bridge.pid even there is no process ever run in this node which really confuse user.

What I did for 2nd issue is to just delete id file anyway in new node. so I end up started idmtools-slurm-bridge on each node(4 nodes total). but my slurm-bridge.pid saved the last one's pid.

@devclinton
Copy link
Member

We should add documentation that users need to connect to same node. At NU, currently users cannot control this, but in future, maybe we can document/work with sysadmins to find ways to guarantee ssh access by node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants