Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for commas in wildcards #56

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Feelx234
Copy link

This PR adds quotation marks around wildcards that contain commas. This will make it possible to properly decode wildcards that contain commas.

See also snakemake/snakemake#2659

@vsoch
Copy link
Contributor

vsoch commented Jan 30, 2024

@Feelx234 possibly a dumb question - why not use two wildcards instead of putting two variables into one? Can you better explain / show me the use case - I'm assuming it's a separator of some type? Or content that would go into a csv file that you want to put in a variable instead?

@Feelx234
Copy link
Author

Feelx234 commented Jan 31, 2024

Hey @vsoch,

I think it's a good question, and my main usecase is convenience. I have used this feature mostly when having rules that aggregate multiple results. My Snakefile looks a little something like this toy file below:

import numpy as np

rule method_a:
    output:
        "intermediate/method_a/{sample}.npy"
    run:
        # in reality we load/process data here
        np.random.seed(int(wildcards.sample))
        value = np.random.rand()
        np.save(output[0], value, allow_pickle=False)


rule method_b:
    output:
        "intermediate/method_b/{sample}.npy"
    run:
        # in reality we load/process data here
        np.random.seed(int(wildcards.sample))
        value = np.random.normal(loc=-2)
        np.save(output[0], value, allow_pickle=False)


def get_input_files(wildcards):
    sample_ids = eval(wildcards["sample_ids"])
    if not isinstance(sample_ids, (list, tuple, range)):
        sample_ids = (sample_ids,)
    required_files = [f"intermediate/{wildcards.method}/{sample_id}.npy" for sample_id in sample_ids]
    return required_files


rule aggregate_results_for_multiple_samples:
    input:
        get_input_files
    output:
        "results/{method}/{sample_ids}.txt"
    run:
        values = [np.load(path) for path in input]
        best_value = np.max(values)
        with open(output[0], 'w') as f:
            f.write(f"{best_value}")

Now when I want to run the workflow for only a single file I can do that:
snakemake "./results/method_b/1.txt" --cores=1
If I want to run this for the first 10 samples, no problem:
snakemake "./results/method_b/range(10).txt" --cores=1

So far so good. Can all be done without allowing for commas in wildcards. But if we allow for it we can do stuff like
running it for every 10th sample (to get the results in an hour, rather than waiting over night):
snakemake "./results/method_b/range(0,100,10).txt" --cores=1
or run the stuff for only the last 10 samples
snakemake "./results/method_b/range(90,100).txt" --cores=1
Or maybe I have my favorite representative samples 3, 9, and 12
snakemake "./results/method_b/(3,9,12).txt" --cores=1

Maybe this is just a very hacky way, but it is very convienient and readable (I know exactly which samples were used to produce the result).

Let me know what you think
Best
Feelx

@Feelx234 Feelx234 changed the title added support for quotes in wildcards feat: support for quotes in wildcards Feb 5, 2024
@Feelx234 Feelx234 changed the title feat: support for quotes in wildcards feat: support for commas in wildcards Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants