Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vep download cache: Add cache url #366

Closed
lczech opened this issue May 24, 2021 · 8 comments
Closed

vep download cache: Add cache url #366

lczech opened this issue May 24, 2021 · 8 comments
Labels
enhancement New feature or request Stale

Comments

@lczech
Copy link

lczech commented May 24, 2021

Is your feature request related to a problem? Please describe.
The documentation of VEP makes it not very obvious how to use genomes that are not Homo sapiens, and it was hard to figure out why my attempts to get VEP to run on a plant species failed. Finally, I figured that one needs to specify a specific (not easy to find) FTP URL from where to download the vep data to the script so that the data can be found.

Hence, I suggest to add this capability to the vep download cache wrapper, and maybe document a bit better how one can select different genomes. Same for the fasta URL, if the user decides to download that data as well - which will however then trigger issue 365, but this is solved in my suggested code below as well.

Describe the solution you'd like
Something like:

from pathlib import Path
from snakemake.shell import shell

# Get params. By default, we run only cache (--AUTO c), unlike the original wrapper,
# which also requested fasta (--AUTO cf), which would then mess up the check 
# in the vep annotation wrapper that the subdirectory of the cache contains a single directory.
# See https://github.com/snakemake/snakemake-wrappers/issues/365
automode = snakemake.params.get("automode", "c")
extra = snakemake.params.get("extra", "")

# Extra optional cache and fasta url
cacheurl = snakemake.params.get("cacheurl", "")
if cacheurl:
    cacheurl = "--CACHEURL \"{}\"".format(cacheurl)
fastaurl = snakemake.params.get("fastaurl", "")
if fastaurl:
    fastaurl = "--FASTAURL \"{}\"".format(fastaurl)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Compared to the original wrapper, we add the two urls, and also use a newer version
# of vep install, which uses --CACHE_VERSION instead of --VERSION.
# This requires to change the environment to use vep 104.
shell(
    "vep_install --AUTO {automode} "
    "--SPECIES {snakemake.params.species} "
    "--ASSEMBLY {snakemake.params.build} "
    "--CACHE_VERSION {snakemake.params.release} "
    "--CACHEDIR {snakemake.output} "
    "--CONVERT "
    "--NO_UPDATE "
    "{cacheurl} {fastaurl} "
    "{extra} {log}"
)

I am currently using this replacement of the wrapper myself, and it gets the job done. Note that this solves issue 365 as well, and that I updated vep to version 104, which would need to be changed in the environment.yaml. Currently, the cache and the annotate wrapper use different versions of vep (101 and 102), which is probably not ideal.

@lczech lczech added the enhancement New feature or request label May 24, 2021
@lczech
Copy link
Author

lczech commented May 24, 2021

For anyone in the future trying to find the FTP URLs for these cache datasets, try http://uswest.ensembl.org/info/docs/tools/vep/script/vep_download.html#installer and http://uswest.ensembl.org/info/docs/tools/vep/script/vep_cache.html#cache :-)
There are links hidden under "Manually downloading caches" :-)

@fgvieira
Copy link
Collaborator

Not quite sure I understand the issue here. I've used all three VEP snakemake wrappers and never encountered the issues you mention. Can you send a minimal example with the latest wrapper versions?

Copy link
Contributor

github-actions bot commented Mar 1, 2024

This issue was marked as stale because it has been open for 6 months with no activity.

@github-actions github-actions bot added the Stale label Mar 1, 2024
Copy link
Contributor

github-actions bot commented Apr 1, 2024

This issue was closed because it has been inactive for 1 month since being marked as stale. Feel free to re-open it if you have any further comments.

@github-actions github-actions bot closed this as completed Apr 1, 2024
@lczech
Copy link
Author

lczech commented Apr 29, 2024

@fgvieira, just catching up with things, and saw that this issue was closed already... anyway, to answer your question, finally: As far as I recall, my issue was with species that are not in the default VEP/ensembl database paths, such as Arabidopsis thaliana (at the time of writing the issue - not sure if it has been added since). My problem was hence that I needed to specify a custom path for the download. Hope that clarifies it.

I'm not sure that this issue is actually resolved. But my tool describes the workaround that I mention above in order to specify custom URLs, so it's fine on my end :-)

@fgvieira
Copy link
Collaborator

If that solution works, would you mind making a PR?

@fgvieira fgvieira reopened this Apr 30, 2024
fgvieira added a commit to fgvieira/snakemake-wrappers that referenced this issue May 7, 2024
@fgvieira
Copy link
Collaborator

fgvieira commented May 7, 2024

@lczech can you check if PR #2928 fixes this issue?

@lczech
Copy link
Author

lczech commented May 7, 2024

@fgvieira, thanks, I think that should work. The wrapper script has changed a bit since, and the curl download has been added, but if the vep_install then takes these curl-downloaded files, that should work. Thank you very much!

johanneskoester pushed a commit that referenced this issue May 28, 2024
<!-- Ensure that the PR title follows conventional commit style (<type>:
<description>)-->
<!-- Possible types are here:
https://github.com/commitizen/conventional-commit-types/blob/master/index.json
-->

<!-- Add a description of your PR here-->
Allow for custom URLs (fix issues #366 and #2649).

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* the `environment.yaml` pinning has been updated by running
`snakedeploy pin-conda-envs environment.yaml` on a linux machine,
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

2 participants