Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sbatch call does not escape quotes in exec_job #29

Open
ebete opened this issue Jan 30, 2024 · 15 comments
Open

sbatch call does not escape quotes in exec_job #29

ebete opened this issue Jan 30, 2024 · 15 comments
Assignees

Comments

@ebete
Copy link

ebete commented Jan 30, 2024

The sbatch command constructed in snakemake_executor_plugin_slurm.Executor().run_job() does not seem to escape quotes in the exec_job variable, causing issues at the following line:

When exec_job contains a double quote ("), it will generate a not-so-helpful error message:

sbatch: error: script arguments not permitted with --wrap option

Extra info

snakemake 8.4.0
snakemake-executor-plugin-slurm 0.2.1
python 3.11.7
slurm 20.11.9
CentOS 7.9.2009
@cmeesters cmeesters self-assigned this Feb 1, 2024
@cmeesters
Copy link
Collaborator

looking into it. Can you provide a minimal example?

@gipert
Copy link

gipert commented Feb 1, 2024

Just stumbled upon this too. Here's a MWE:

# Snakefile
rule all:
    input: "a.dat", "b.dat"

rule a:
    output: "{sample}.dat"
    resources: runtime=50
    group: "foo"
    shell: "touch {output} && sleep 200"
# default/config.yaml
# works on the NERSC supercomputer
executor: slurm
default-resources:
  - slurm_account=m2676
  - constraint=cpu
  - slurm_extra="--qos regular"

jobs: unlimited
cores: all
group-components:
  - foo=2

execute with snakemake --profile default. Output (the missing partition error is innocuous):

Using profile default for setting default command line arguments.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 9223372036854775807
Job stats:
job      count
-----  -------
a            2
all          1
total        3

Select jobs to execute...
Execute 1 jobs...
[Thu Feb  1 06:42:52 2024]

group job foo (jobs in lexicogr. order):

    [Thu Feb  1 06:42:52 2024]
    rule a:
        output: a.dat
        jobid: 1
        reason: Code has changed since last execution
        wildcards: sample=a
        resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=m2676, constraint=cpu, slurm_extra=--qos regular, runtime=50


    [Thu Feb  1 06:42:52 2024]
    rule a:
        output: b.dat
        jobid: 2
        reason: Code has changed since last execution
        wildcards: sample=b
        resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=m2676, constraint=cpu, slurm_extra=--qos regular, runtime=50

No partition was given for rule 'JobGroup(foo,frozenset({a, a}))', and unable to find a default partition. Trying to submit without partition information. You may want to invoke snakemake with --default-resources 'slurm_partition=<your default partition>'.
WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option

@cmeesters
Copy link
Collaborator

cmeesters commented Feb 5, 2024

fixed in release 8.4.3 of the snakemake core - however, the odd propagation of function values instead of string flags is just fixed. A release is forthcoming. Stay tuned.

@ebete
Copy link
Author

ebete commented Feb 6, 2024

Unfortunately I still get the same error using snakemake 8.4.4 + snakemake-executor-plugin-slurm 0.3.0.
The problem for me at least is that --apptainer-args is passed with mixed quotes:

See the sbatch command (added some line breaks for clarity):

sbatch call: 
sbatch 
--job-name 851bba7b-9529-4d79-90e2-69cb693d8d46 --output $PROJ/.snakemake/slurm_logs/rule_qiime_import/%j.log --export=ALL --comment qiime_import -A slurm -p cpu -t 900 --mem 50000 --cpus-per-task=1 -D $PROJ 
--wrap="
mambaforge/envs/snakemake/bin/python3.11 -m snakemake --snakefile '$PROJ/Snakefile' --target-jobs 'qiime_import:project=result' --allowed-rules 'qiime_import' --cores 'all' --attempt 1 --force-use-threads  --resources 'mem_mb=50000' 'mem_mib=47684' 'disk_mb=2676' 'disk_mib=2553' 'runtime_max=86400' 'load=0' --wait-for-files-file '$PROJ/.snakemake/tmp.qhihjnxt/j339.qiime_import.snk.waitforfilesfile.txt' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete 
--verbose 
--rerun-triggers input params software-env mtime code 
--deployment-method conda apptainer 
--conda-frontend 'mamba' 
--conda-prefix '.snakemake-envs' 
--conda-base-path 'mambaforge/envs/snakemake' 
--apptainer-prefix '.snakemake-envs' 

--apptainer-args "--bind '$one_path:$one_path:rw' --bind '$other_path:$other_path:rw'" 

--shared-fs-usage software-deployment input-output persistence sources source-cache storage-local-copies 
--wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' 
--configfiles $PROJ/setup.yml 
--printshellcmds  
--latency-wait 60 
--scheduler 'ilp' 
--scheduler-solver-path 'mambaforge/envs/snakemake/bin' 
--default-resources 'mem_mb=1000' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' 'slurm_account=slurm' 'slurm_partition=cpu' 'runtime=1440' 'runtime_max=86400' 'load=0' 
--executor slurm-jobstep 
--jobs 1 
--mode 'remote'
"

.. SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option

for now I worked around this by removing the quotes around the --bind '...', but that'll only hold up as long as there are no spaces in these paths.
I'm not sure what the best solution would be as you basically need both quotes around the --apptainer-args arguments and the --bind ones.
Escaping the quotes gave me a different set of problems.

If nothing else, it may be an idea to just put an error message here like:

if exec_job.contains('"'):
    raise ValueError("SLURM submission command contains illegal character \"")

@cmeesters
Copy link
Collaborator

What a command line!

Currently, I am unable to test - no cluster (maintenance). Can you indicate the malicious line of your Snakemake command line or your resource yaml, please?

@gipert
Copy link

gipert commented Feb 6, 2024

The MWE I posted earlier keeps failing with Snakemake 8.4.3, for the record.

@ebete
Copy link
Author

ebete commented Feb 6, 2024

@cmeesters
A large part of that is generated by snakemake itself luckily.

The error stems from the --apptainer-args argument with mixed quotations.

Anyway, a (minimal) reproducible example (requires singularity/apptainer + SLURM):

slurm_test.snk

# snakemake

# set up -------------------------------------------------------------------------------------------
from snakemake.utils import min_version

min_version("8.0.0")


localrules: all
container: "docker://condaforge/mambaforge:latest"


SAMPLE_NAMES = [f"file_{x:d}" for x in range(1,5)]
OUTPUT_DIR = "results_test"



# start --------------------------------------------------------------------------------------------
rule all:
    message: "All target files have been generated/updated"
    default_target: True
    threads: 1
    input:
        expand("{project}/{sample}.txt", sample=SAMPLE_NAMES, project=OUTPUT_DIR)

rule gen_file:
    output:
        "{project}/{sample}.txt"
    threads: 1
    resources:
        load = 100
    shell:
        'printf "now: %s\n" "$(date)" > {output:q}'

run.sh

#!/usr/bin/env bash

set -o nounset
set -o errexit
set -o pipefail

extra_params=${@:1}
# set this script path as the working directory
cd -P -- $(dirname "$0")


echo "Snakemake $(snakemake --version) | $(python -VV)" >&2
echo "[RUN_TEST] Running pipeline with test dataset ..." >&2
snakemake \
	--snakefile "slurm_test.snk" \
	--executor slurm \
	--local-cores 1 \
	--jobs 4 \
	--rerun-incomplete \
	--software-deployment-method apptainer \
	--apptainer-prefix "$HOME/.snakemake-envs" \
	--apptainer-args "--bind '$HOME/.snakemake-envs:$HOME/.snakemake-envs:rw'" \
	--printshellcmds \
	--restart-times 0 \
	--show-failed-logs \
	--resources 'load=100' \
	--default-resources \
		'load=0' \
		'runtime=600' \
		'mem_mb=1024' \
		'slurm_account=slurm' \
		'slurm_partition=cpu' \
	--cores 1 \
	--forceall \
	$extra_params

@bdelepine
Copy link

Hi everyone,

It seems that the same issue occurs when a workflow-profile configuration file contains a config keyword to store some configuration variables. Please find hereafter a MWE and a workaround for this specific problem.

Snakefile:

rule all:
  output:
    touch("all.done")
  shell:
    "wait 10"

profile/config.yaml that contains some configuration variables:

executor: "slurm"
default-resources:
  slurm_account: "XXXXXXXXXXXX"
  slurm_partition: "XXXX"
  runtime: 1435
  cpus_per_task: 1
  nodes: 1
  tasks: 1
  mem_mb: 2000
jobs: 1
config:
  importantlist:
    - "item1"
    - 'item2'
    - item3

Calling snakemake with snakemake --workflow-profile profile --verbose, we get:

rule all:
    output: all.done
    jobid: 0
    reason: Missing output files: all.done
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=XXXXXXX, slurm_partition=fast, runtime=1435, cpus_per_task=1, nodes=1, tasks=1

sbatch call: sbatch --job-name fee9209a-17a3-46d0-bab5-0765fb728317 --output /redacted/path/ --export=ALL --comment all -A XXXXX -p XXXXX -t 1435 --mem 2000 --cpus-per-task=1 -D /redacted/path/ --wrap="/redacted/path/bin/python3.12 -m snakemake --snakefile '/redacted/path/Snakefile' --target-jobs 'all:' --allowed-rules 'all' --cores 'all' --attempt 1 --force-use-threads  --resources 'mem_mb=2000' 'mem_mib=1908' 'disk_mb=1000' 'disk_mib=954' 'cpus_per_task=1' 'nodes=1' 'tasks=1' --wait-for-files '/redacted/path/.snakemake/tmp.6vkh9cws' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose  --rerun-triggers code software-env params input mtime --conda-frontend 'mamba' --shared-fs-usage software-deployment sources input-output storage-local-copies persistence source-cache --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --config "importantlist=['item1', 'item2', 'item3']" --latency-wait 5 --scheduler 'ilp' --scheduler-solver-path '/redacted/path/envs/snakemake-8.0.1/bin' --default-resources 'mem_mb=2000' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' 'slurm_account=XXXX' 'slurm_partition=XXXX' 'runtime=1435' 'cpus_per_task=1' 'nodes=1' 'tasks=1' --executor slurm-jobstep --jobs 1 --mode 'remote'"
unlocking
removing lock
removing lock
removed all locks

[...]

WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option

Note that the config keyword was translated to --config "importantlist=['item1', 'item2', 'item3']", the un-escaped " been likely to cause the WorkflowError since it is already used by --wrap="[...]".

As a workaround, I moved my config parameters to a config.yaml file:

importantlist:
  - "item1"
  - 'item2'
  - item3

Running snakemake as snakemake --workflow-profile profile --configfile config.yaml --verbose works as expected.

Tested with:

  • snakemake 8.0.1
  • snakemake-executor-plugin-slurm 0.2.0

HTH, best,

@cmeesters
Copy link
Collaborator

cmeesters commented Feb 26, 2024

Please update to the current version of the executor and Snakemake itself and try again. My own test is working, now.

Note, that you ought to write: slurm_extra="'--flag1=arg1 --flag2=arg2'"

@ebete
Copy link
Author

ebete commented Feb 26, 2024

@cmeesters Still the same exception.

WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option

snakemake 8.4.11
snakemake-executor-plugin-slurm 0.3.2
python 3.11.8

@cmeesters
Copy link
Collaborator

cmeesters commented Feb 29, 2024

Sorry, I could not get back to this issue any sooner.

@ebete With $ snakemake --workflow-profile ./profiles -j unlimited (rest are my defaults) and

$ cat ./profiles/config.yaml
default-resources:
    slurm_partition: "<partition>"
    slurm_account: "<account>"
    runtime: "10h"
    apptainer_prefix: "$HOME/.snakemake-envs"
    apptainer_args: "--bind '$HOME/.snakemake-enve:$HOME/.snakemake-envs:rw'"

set-resources:
    gen_file:
        load: 100

I could get your testcase to work. Before that, I think we went into Dante's fourth hell of quotation nesting.

@cmeesters
Copy link
Collaborator

@ebete is it working? Or do these issues persist?

@ebete
Copy link
Author

ebete commented Mar 11, 2024

@cmeesters
As of now, the issue still persists. I do have a small patch that should solve it though:

diff --git a/snakemake_executor_plugin_slurm/__init__.py b/snakemake_executor_plugin_slurm/__init__.py
index a97f4aa..a214983 100644
--- a/snakemake_executor_plugin_slurm/__init__.py
+++ b/snakemake_executor_plugin_slurm/__init__.py
@@ -6,6 +6,7 @@ __license__ = "MIT"
 import csv
 from io import StringIO
 import os
+import re
 import subprocess
 import time
 from datetime import datetime, timedelta
@@ -141,6 +142,8 @@ class Executor(RemoteExecutor):
         # (see https://github.com/snakemake/snakemake/issues/2014)
         call += f" -D {self.workflow.workdir_init}"
         # and finally the job to execute with all the snakemake parameters
+        # escape any unescaped double quotes to prevent the --wrap="..." to be prematurely closed
+        exec_job = re.sub(r'(?<!\\)"', r"\"", exec_job)
         call += f' --wrap="{exec_job}"'
 
         self.logger.debug(f"sbatch call: {call}")

@cmeesters
Copy link
Collaborator

Looks nice! Out of curiosity: Did it fail the same way as before using the workflow-profile?

Anyway, I do not want to leave you hanging. Yet, I just got a new student, a workshop for Wednesday to prepare and will take a few days off, because our school is closed. With other words: little time for testing. Will do so a.s.a.p.

@ebete
Copy link
Author

ebete commented Mar 11, 2024

I reckon that the profile would work. But in my case I'm generating the --binds dynamically in a wrapper script, hence a static yml file would not suffice.

And no worries, enjoy the break :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants