Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic runtime resource fails with SLURM #62

Open
nikostr opened this issue Apr 5, 2024 · 3 comments
Open

Dynamic runtime resource fails with SLURM #62

nikostr opened this issue Apr 5, 2024 · 3 comments

Comments

@nikostr
Copy link

nikostr commented Apr 5, 2024

I've created a minimal workflow and I set runtime: f"{2 + attempt}h" in my workflow profile. It is correctly parsed by snakemake in the sense that it prints runtime=180 as a part of the resources, but I get the error SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option. I improvised the runtime specification since I couldn't find a documented way of doing it - is there a recommended/working way to specify dynamic runtimes in the profile?

@cmeesters
Copy link
Collaborator

Outch. Thanks for the report!

Can you please attach your minimal example? And perhaps a log create with snakemake --verbose ..., too? That would be extremely helpful.

@nikostr
Copy link
Author

nikostr commented Apr 5, 2024

Sure!
workflow/Snakefile:

rule all:
    output:
        'results/a'
    shell:
        ''

workflow/profiles/default/config.yaml:

executor: slurm
jobs: 1
retries: 2
default-resources:
  slurm_account: <account>
  runtime: f"{2 + attempt}h"
  slurm_partition: core

and a slightly redacted version of the verbose log:

Using workflow specific profile workflow/profiles/default for setting default command line arguments.
Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: False
SLURM run ID: d71a0ae6-210a-4886-b197-508e567eb099
Using shell: /usr/bin/bash
Provided remote nodes: 1
Job stats:
job      count
-----  -------
all          1
total        1

Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 1}
Ready jobs (1)
Select jobs to execute...
Using greedy selector because only single job has to be scheduled.
Inferred runtime value of 180 minutes from 3h
Selected jobs (1)
Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 0}
Execute 1 jobs...

[Fri Apr  5 11:29:29 2024]
rule all:
    output: results/a
    jobid: 0
    reason: Missing output files: results/a
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=$SLURM_ACCOUNT, runtime=180, slurm_partition=core

sbatch call: sbatch --job-name d71a0ae6-210a-4886-b197-508e567eb099 --output $DIR/snakemake-runtime-bug/.snakemake/slurm_logs/rule_all/%j.log --export=ALL --comment all -A $SLURM_ACCOUNT -p core -t 180 --mem 1000 --cpus-per-task=1 -D $DIR/snakemake-runtime-bug --wrap="$HOME/.conda/envs/snakemake/bin/python3.12 -m snakemake --snakefile $DIR/snakemake-runtime-bug/workflow/Snakefile --target-jobs all: --allowed-rules all --cores all --attempt 1 --force-use-threads  --resources mem_mb=1000 mem_mib=954 disk_mb=1000 disk_mib=954 --wait-for-files $DIR/snakemake-runtime-bug/.snakemake/tmp._isktcvq --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose  --rerun-triggers mtime software-env code params input --conda-frontend mamba --shared-fs-usage input-output storage-local-copies software-deployment source-cache persistence sources --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --latency-wait 5 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path $HOME/.conda/envs/snakemake/bin --default-resources 'mem_mb=min(max(2*input.size_mb, 1000), 8000)' 'disk_mb=max(2*input.size_mb, 1000)' tmpdir=system_tmpdir slurm_account=$SLURM_ACCOUNT 'runtime=f"{2 + attempt}h"' slurm_partition=core --executor slurm-jobstep --jobs 1 --mode remote"
unlocking
removing lock
removing lock
removed all locks
Full Traceback (most recent call last):
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake_executor_plugin_slurm/__init__.py", line 138, in run_job
    out = subprocess.check_output(
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "$HOME/.conda/envs/snakemake/lib/python3.12/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "$HOME/.conda/envs/snakemake/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'sbatch --job-name d71a0ae6-210a-4886-b197-508e567eb099 --output $DIR/snakemake-runtime-bug/.snakemake/slurm_logs/rule_all/%j.log --export=ALL --comment all -A $SLURM_ACCOUNT -p core -t 180 --mem 1000 --cpus-per-task=1 -D $DIR/snakemake-runtime-bug --wrap="$HOME/.conda/envs/snakemake/bin/python3.12 -m snakemake --snakefile $DIR/snakemake-runtime-bug/workflow/Snakefile --target-jobs all: --allowed-rules all --cores all --attempt 1 --force-use-threads  --resources mem_mb=1000 mem_mib=954 disk_mb=1000 disk_mib=954 --wait-for-files $DIR/snakemake-runtime-bug/.snakemake/tmp._isktcvq --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose  --rerun-triggers mtime software-env code params input --conda-frontend mamba --shared-fs-usage input-output storage-local-copies software-deployment source-cache persistence sources --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --latency-wait 5 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path $HOME/.conda/envs/snakemake/bin --default-resources 'mem_mb=min(max(2*input.size_mb, 1000), 8000)' 'disk_mb=max(2*input.size_mb, 1000)' tmpdir=system_tmpdir slurm_account=$SLURM_ACCOUNT 'runtime=f"{2 + attempt}h"' slurm_partition=core --executor slurm-jobstep --jobs 1 --mode remote"' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake/cli.py", line 2052, in args_to_api
    dag_api.execute_workflow(
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow
    workflow.execute(
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1247, in execute
    raise e
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1243, in execute
    success = self.scheduler.schedule()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake/scheduler.py", line 306, in schedule
    self.run(runjobs)
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake/scheduler.py", line 394, in run
    executor.run_jobs(jobs)
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake_interface_executor_plugins/executors/base.py", line 72, in run_jobs
    self.run_job(job)
  File "$HOME/.conda/envs/snakemake/lib/python3.12/site-packages/snakemake_executor_plugin_slurm/__init__.py", line 142, in run_job
    raise WorkflowError(
snakemake_interface_common.exceptions.WorkflowError: SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option


WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option

@nikostr
Copy link
Author

nikostr commented Apr 5, 2024

I just tried replacing the runtime with str(2 + attempt) + "h" and it seems to work! Would this be the recommended way to do this? Would it make sense to add this to the documentation?

EDIT: tried this again, and this time it protested. Additional verification needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants