Skip to content

speeding-up-science-workshops/KEGGDecoder-heatmap

 
 

Repository files navigation

KEGGDecoder - Visualizing biogeochemical metabolic pathways

Quick Start

  • Launch the binder with the button above. This will take a few minutes to load.
  • Upload your GhostKOALA or KofamScan output to the Jupyter Notebook in your binder session with the Upload button near the top right of that page.
  • Select New Python 3 Notebook from the New button next to the Upload button. This will open in a new tab.
  • From this Jupyter notebook, execute: !KEGG-decoder --input TOBG-MED.TREEMATCH.koalaoutput.txt --output TOBG-MED.decoder.tsv --vizoption interactive
  • Return to the first binder session tab and click on the new TOBG-MED.decoder.html file to open it in a new tab.
  • Explore.
  • For instructions on how to generate the input data needed for KEGGDecoder, see the notebooks in KEGGDecoder_usage.ipynb

Example Output

  • Example Output

Summary

  • Designed to parse and visualize a KEGG-Koala output to determine the completeness of biogeochemically-relevant metabolic pathways - metabolic pathways are hand-curated to select for informative markers of predicted function
  • This binder provides a demo for KEGGDecoder, which makes a metabolic heatmap. The package was developed by Ben Tully and expanded during the Speeding Up Science workshop by Jay Osvatic, Roth Conrad, Luiz Irber, Taylor Reiter, Chris Neely, Jason Fell, and Marisa Lim.
  • The immediate input for KEGGDecoder is a tab-delimited KEGG-Koala output file that consists of a formatted gene ID and a correspoding KEGG Ontology assignment - Options for generating a KEGG-Koala output include:
    • BlastKOALA
    • GhostKOALA
    • KofamScan - KEGG Ontology assignment requires a protein fasta file for submission and can be generated by tools such as:
    • Prodigal
    • MetaSanity - which can generate both a protein FASTA file and a KEGG-Koala output for a set of genomes
  • The output heatmap provides a broad overview of (at least) 145 metabolic processes that include pathways, multi-subunit proteins, or singular proteins. The heatmap displays the results as a fraction of a complete pathway or multi-subunit protein (0-1) or as present/absent for singular proteins (0 or 1). Labels for each process are meant to be informative, but are not definitive. Information for which KEGG Ontology is in each process can be found here. KEGGDecoder should function as a hypothesis generating tool. Processes without matches does not mean it is not present in a genome (and vice versa).

Authors

  • Benjamin Tully, bjtully, 0000-0002-9384-7635
  • Taylor Reiter, taylorreiter, 0000-0002-7388-421X
  • Luiz Irber, luizirber, 0000-0003-4371-9659
  • Roth Conrad, rotheconrad, 0000-0001-8155-8441
  • Jay Osvatic, osvatic, 0000-0002-7765-0058
  • Chris Neely, cjnelly10, 0000-0002-2620-8948
  • Marisa Lim, marisalim, 0000-0003-2097-8818
  • Jason Fell, jfell13, 0000-0001-6680-2936

Links

Zenodo Binder, doi: LINK_TO_BINDER

Github Binder: Binder

Github Repository: https://github.com/speeding-up-science-workshops/KEGGDecoder-binder/

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%