feat: support for commas in wildcards #56

Feelx234 · 2024-01-30T10:57:55Z

This PR adds quotation marks around wildcards that contain commas. This will make it possible to properly decode wildcards that contain commas.

See also snakemake/snakemake#2659

vsoch · 2024-01-30T16:41:11Z

@Feelx234 possibly a dumb question - why not use two wildcards instead of putting two variables into one? Can you better explain / show me the use case - I'm assuming it's a separator of some type? Or content that would go into a csv file that you want to put in a variable instead?

Feelx234 · 2024-01-31T09:33:43Z

Hey @vsoch,

I think it's a good question, and my main usecase is convenience. I have used this feature mostly when having rules that aggregate multiple results. My Snakefile looks a little something like this toy file below:

import numpy as np

rule method_a:
    output:
        "intermediate/method_a/{sample}.npy"
    run:
        # in reality we load/process data here
        np.random.seed(int(wildcards.sample))
        value = np.random.rand()
        np.save(output[0], value, allow_pickle=False)


rule method_b:
    output:
        "intermediate/method_b/{sample}.npy"
    run:
        # in reality we load/process data here
        np.random.seed(int(wildcards.sample))
        value = np.random.normal(loc=-2)
        np.save(output[0], value, allow_pickle=False)


def get_input_files(wildcards):
    sample_ids = eval(wildcards["sample_ids"])
    if not isinstance(sample_ids, (list, tuple, range)):
        sample_ids = (sample_ids,)
    required_files = [f"intermediate/{wildcards.method}/{sample_id}.npy" for sample_id in sample_ids]
    return required_files


rule aggregate_results_for_multiple_samples:
    input:
        get_input_files
    output:
        "results/{method}/{sample_ids}.txt"
    run:
        values = [np.load(path) for path in input]
        best_value = np.max(values)
        with open(output[0], 'w') as f:
            f.write(f"{best_value}")

Now when I want to run the workflow for only a single file I can do that:
snakemake "./results/method_b/1.txt" --cores=1
If I want to run this for the first 10 samples, no problem:
snakemake "./results/method_b/range(10).txt" --cores=1

So far so good. Can all be done without allowing for commas in wildcards. But if we allow for it we can do stuff like
running it for every 10th sample (to get the results in an hour, rather than waiting over night):
snakemake "./results/method_b/range(0,100,10).txt" --cores=1
or run the stuff for only the last 10 samples
snakemake "./results/method_b/range(90,100).txt" --cores=1
Or maybe I have my favorite representative samples 3, 9, and 12
snakemake "./results/method_b/(3,9,12).txt" --cores=1

Maybe this is just a very hacky way, but it is very convienient and readable (I know exactly which samples were used to produce the result).

Let me know what you think
Best
Feelx

added support for quotes in wildcards

22a0223

Feelx234 mentioned this pull request Jan 30, 2024

feat: add support for comma in wildcards snakemake/snakemake#2659

Open

2 tasks

Feelx234 changed the title ~~added support for quotes in wildcards~~ feat: support for quotes in wildcards Feb 5, 2024

Feelx234 changed the title ~~feat: support for quotes in wildcards~~ feat: support for commas in wildcards Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support for commas in wildcards #56

feat: support for commas in wildcards #56

Feelx234 commented Jan 30, 2024

vsoch commented Jan 30, 2024

Feelx234 commented Jan 31, 2024 •

edited

feat: support for commas in wildcards #56

Are you sure you want to change the base?

feat: support for commas in wildcards #56

Conversation

Feelx234 commented Jan 30, 2024

vsoch commented Jan 30, 2024

Feelx234 commented Jan 31, 2024 • edited

Feelx234 commented Jan 31, 2024 •

edited