Skip to content
This repository has been archived by the owner on Nov 8, 2021. It is now read-only.

Combine information regarding reads, insertions and genomic features from different sources in one variable for easier access. #26

Open
Gregory94 opened this issue Oct 12, 2020 · 1 comment

Comments

@Gregory94
Copy link
Collaborator

Gregory94 commented Oct 12, 2020

For processing and plotting of the data from the SATAY experiments, typically many different files need to be combined.
To make this a bit easier, the python script 'genomicfeatures_dataframe_with_normalization.py' is developed that creates a dataframe that includes information about various features in a chromosome (e.g. genes, telomeres, centromeres, ARS, etc.) and the number of reads and insertions.
It also includes a normalization procedure from 'reads_normalization.py'.

@Gregory94 Gregory94 created this issue from a note in SATAY-analysis-workflow-board (Done) Oct 12, 2020
@Gregory94 Gregory94 self-assigned this Oct 12, 2020
@Gregory94
Copy link
Collaborator Author

Gregory94 commented Oct 12, 2020

The dataframe created by genomicfeatures_dataframe_with_normalization.py looks as follows:
dna_df2

It currently includes the following information:

  • Feature name
  • Standard gene name
  • Feature aliases
  • Feature type
  • Essentiality
  • Basepair position in chromosome
  • Feature length
  • Number of insertions
  • Number of insertions in central 80% of gene
  • Number of reads
  • Number of reads in central 80% of gene
  • Number of insertions per basepair
  • Number of insertions per basepair in central 80% of gene
  • Number of reads per basepair
  • Number of reads in central 80% of gene per basepair
  • Number of reads per basepair normalized
  • Number of reads per basepair in central 80% of gene normalized
  • Number of reads per basepair in central part of gene normalized where the average number of basepairs in the noncoding regions in each window is defined as 1.

To use this function in any python script, make sure that the input matches the help-description at the beginning of the function and that all the required files are present (the location of the files are noted in the function).
The output is the variable dna_df2 that includes the dataframe.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

1 participant