Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: check template rendering output for leaked input file paths #2850

Merged
merged 1 commit into from
Apr 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
30 changes: 28 additions & 2 deletions docs/snakefiles/rules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2712,7 +2712,7 @@ Apart from Jinja2, Snakemake supports `YTE <https://github.com/koesterlab/yte>`_

.. code-block:: python

rule render_jinja2_template:
rule render_yte_template:
input:
"some-yte-template.yaml"
output:
Expand All @@ -2737,7 +2737,33 @@ Analogously to the jinja2 case YTE has access to ``params``, ``wildcards``, and
- b
- ?config["threshold"]

Template rendering rules are always executed locally, without submission to cluster or cloud processes (since templating is usually not resource intensive).
By default, template rendering rules are executed locally, without submission to cluster or cloud processes (since templating is usually not resource intensive).
However, if a :ref:`storage plugin <storage-support>` is used, a template rule can theoretically leak paths to local copies of the storage files into the rendered template.
This can happen if the template inserts the path of an input file into the rendered output.
Snakemake tries to detect such cases by checking the template output.
To avoid such leaks (only required if your template does something like that with an input file path), you can assign the same :ref:`group <job_grouping>` to your template rule and the consuming rule, and in addition mark the template output as ``temp()``, i.e.:

.. code-block:: python

rule render_yte_template:
input:
"some-yte-template.yaml"
output:
temp("results/{sample}.rendered-version.yaml")
params:
foo=0.1
group: "some-group"
template_engine:
"yte"

rule consume_template:
input:
"results/{sample}.rendered-version.yaml"
output:
"results/some-output.txt"
group: "some-group"
shell:
"sometool {input} {output}"

.. _snakefiles_mpi_support:

Expand Down
10 changes: 10 additions & 0 deletions snakemake/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from abc import ABC, abstractmethod
from snakemake.settings import DeploymentMethod

from snakemake.template_rendering import check_template_output
from snakemake_interface_common.utils import lazy_property
from snakemake_interface_executor_plugins.jobs import (
JobExecutorInterface,
Expand Down Expand Up @@ -1101,6 +1102,15 @@ async def postprocess(
wait_for_local=True,
)
self.dag.unshadow_output(self, only_log=error)

if (
not error
and self.rule.is_template_engine
and not is_flagged(self.output[0], "temp")
):
# TODO also check if consumers are executed on the same node
check_template_output(self)

await self.dag.handle_storage(
self, store_in_storage=store_in_storage, store_only_log=error
)
Expand Down
14 changes: 14 additions & 0 deletions snakemake/template_rendering/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,17 @@ def render_template(engine, input, output, params, wildcards, config, rule):
)
except Exception as e:
raise WorkflowError(f"Error rendering template in rule {rule}.", e)


def check_template_output(job):
with open(job.output[0]) as out:
for l in out:
for f in job.input:
if f.is_storage and f in l:
raise WorkflowError(
"Output of template_engine rule contains local path to input file "
f"from storage: {f} for {f.storage_object.query}. "
"However, this path is variable as it can change between runs (e.g. when "
"the storage local prefix is modified). To circumvent this issue, place the "
"rule in one group with the consumer(s) and mark the output as temp()."
)