Combine information regarding reads, insertions and genomic features from different sources in one variable for easier access. #26

Gregory94 · 2020-10-12T08:58:40Z

For processing and plotting of the data from the SATAY experiments, typically many different files need to be combined.
To make this a bit easier, the python script 'genomicfeatures_dataframe_with_normalization.py' is developed that creates a dataframe that includes information about various features in a chromosome (e.g. genes, telomeres, centromeres, ARS, etc.) and the number of reads and insertions.
It also includes a normalization procedure from 'reads_normalization.py'.

Gregory94 · 2020-10-12T09:23:30Z

The dataframe created by genomicfeatures_dataframe_with_normalization.py looks as follows:

It currently includes the following information:

Feature name
Standard gene name
Feature aliases
Feature type
Essentiality
Basepair position in chromosome
Feature length
Number of insertions
Number of insertions in central 80% of gene
Number of reads
Number of reads in central 80% of gene
Number of insertions per basepair
Number of insertions per basepair in central 80% of gene
Number of reads per basepair
Number of reads in central 80% of gene per basepair
Number of reads per basepair normalized
Number of reads per basepair in central 80% of gene normalized
Number of reads per basepair in central part of gene normalized where the average number of basepairs in the noncoding regions in each window is defined as 1.

To use this function in any python script, make sure that the input matches the help-description at the beginning of the function and that all the required files are present (the location of the files are noted in the function).
The output is the variable dna_df2 that includes the dataframe.

Gregory94 created this issue from a note in SATAY-analysis-workflow-board (Done) Oct 12, 2020

Gregory94 self-assigned this Oct 12, 2020

Gregory94 added the data processing label Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine information regarding reads, insertions and genomic features from different sources in one variable for easier access. #26

Combine information regarding reads, insertions and genomic features from different sources in one variable for easier access. #26

Gregory94 commented Oct 12, 2020 •

edited

Gregory94 commented Oct 12, 2020 •

edited

Combine information regarding reads, insertions and genomic features from different sources in one variable for easier access. #26

Combine information regarding reads, insertions and genomic features from different sources in one variable for easier access. #26

Comments

Gregory94 commented Oct 12, 2020 • edited

Gregory94 commented Oct 12, 2020 • edited

Gregory94 commented Oct 12, 2020 •

edited

Gregory94 commented Oct 12, 2020 •

edited