Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hisat2 align on large index #2537

Open
andreott opened this issue Jan 10, 2024 · 6 comments
Open

Hisat2 align on large index #2537

andreott opened this issue Jan 10, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@andreott
Copy link

Snakemake version
snakemake-wrappers: 3.3.3

Describe the bug
The wrapper looks for the index file checking for .ht2 file extension only. So this fails when a large index was produced with extension .ht2l

Fixing
A quick and dirty fixing would just use

ht2_files = Path(snakemake.input.idx).glob("*.ht2*")

instead of

ht2_files = Path(snakemake.input.idx).glob("*.ht2")

... or search for both and combine

However I would rather vote for the user to provide the prefix instead of searching in the provided directory, as problems would arise with multiple indexes in the provided path.

@andreott andreott added the bug Something isn't working label Jan 10, 2024
@fgvieira
Copy link
Collaborator

I agree. The user providing the prefix sounds safer. Do you think you can make a PR with those changes?

@andreott
Copy link
Author

Yes, I will provide a PR

@fgvieira
Copy link
Collaborator

Actually, I think it might be better to just provide all the index files as input. Something like:

rule hisat2_align:
    input:
        reads=["reads/{sample}_R1.fastq", "reads/{sample}_R2.fastq"],
        idx=multiext("index/ref", ".1.ht2", ".2.ht2", ".3.ht2", ".4.ht2", ".5.ht2", ".6.ht2", ".7.ht2", ".8.ht2"),
[...]

Then, I think you just need to delete line:

ht2_files = Path(snakemake.input.idx).glob("*.ht2")

And fix line:

idx_prefix = os.path.commonprefix(list(ht2_files)).rstrip(".")

@andreott
Copy link
Author

I think the problem is, if you build the index within the workflow, you will not know in advance if it is a large index.

@fgvieira
Copy link
Collaborator

Unless you force hisat2-build to build a large index.

@andreott
Copy link
Author

andreott commented Mar 6, 2024

Hi, getting back to this one - sorry was quite busy the last weeks. Well for me, both are ok. As for the STAR wrapper, I guess it would be more portable not to provide the exact filenames and forcing the user to build the large index to enforce portability. Also I realised that my previous concern (multiple indices in the same directory) is not valid, as the wrapper is supposed to be called with the directory as output, so it will be overwritten by the next indexing run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants