You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There can be odd issues when running a simulation where num_cores > num_nodes. It looks like there was probably an unhandled exception on the rank 3 process because it didn't have any nodes assigned:
00:00:01 [3] [I] [Simulation] Rank 3 contributes 0 nodes...
00:00:01 [3] [I] [Simulation] Rank map contents not displayed until NodeRankMap::ToString() (re)implemented.
00:00:01 [3] [W] [Simulation] Rank 3 wasn't assigned any nodes! (# of procs is too big for simulation?)
00:00:01 [3] [I] [Eradication] Controller execution failed, exiting.
For each simulation on SLURM, the command we run is _run.sh and it contains the actual command:
singularity exec /home/zdf1921/shared/rocky_dtk_runner_py39.sif Assets/Eradication --config config.json --dll-path ./Assets --input-path ./Assets\;.
It works fine in my testing example and generated the expected results in output folder (as seen on COMPS).
Based on NYU user's e-mail suggestion, I changed the command to the following (add 'mpirun -n 4'):
singularity exec /home/zdf1921/shared/rocky_dtk_runner_py39.sif mpirun -n 4 Assets/Eradication --config config.json --dll-path ./Assets --input-path ./Assets\;.
The file stdout.txt seems like it does use multi cores, however every simulation execution failed.
Please refer to attached file stdout.txt for details.
stdout.txt
The text was updated successfully, but these errors were encountered: