How to interpret SATAY data in order to have meaningful information from it? #27

Wteunisse · 2020-10-16T09:07:09Z

Some additional comments on how to interpret the data were made In the meeting with Werner.

There is a sequencing bias in the number of reads. We probably cannot do anything about this, but the sequencing may add an extra layer of variance to the number of reads.
Also, during the sequencing, there is a chance of observing a transposon or not. I don't think I fully understand this problem yet, but Werner suggested that we should look into a 'negative binomial distribution'. This because we only know the observed number of transposons but this might not be equal to the actual number of transposons.

wdaalman · 2020-10-20T18:29:22Z

To help you further along, the actual cells with transposons are converted into reads such that the reads follow a binomial distribution. Since we have the inverse problem,
if you know the reads and if you would know the probability that a cell with transposon turns into a read, the actual number of cells with transposon (including the unobserved ones) would follow a negative binomial distribution.

However, we cannot easily use the negative binomial distribution to invert reads to actual transposons, since Wessel mentioned today we don't know that probability, I thought you could try something else, namely finding the best fitting binomial distribution. Unfortunately Matlab's mle wants to have the probability parameter fixed, so I wrote a small script in Matlab using the generalized method of moments instead to fit simulated read data. This works reasonably well (run Reads_transposon_conversion_simulation_v2.m in the zip file).
Reads transposon conversion v2.zip

Two caveats can be that: (Updated to v2 to resolve first caveat: ~~1) we do not know which regions have no reads because they are unlucky in read-out or because they are very unfit. This gives a bias.~~)
2) In constructing a read distribution across the DNA including Wessel's normalization, we have not corrected for fitness bias. So an idea could be to first do this only for non-coding regions, get the probability parameter estimate, and use that on the real genes and invert reads to transpsons there using the negative binomial distribution.

Wteunisse · 2020-10-20T20:51:32Z

Very interesting, I will look into it! One thought I had about the probability is that we might be able to estimate the total number of cells during the SATAY experiment. I think Benoît also mentions a number in his paper, from this we know how much transpositions have taken place. So maybe we can have a good estimation of the probability of actually reading transposition.

wdaalman · 2020-10-23T09:28:19Z

That sounds good, it would be reassuring to see if there is a reasonable match with the fitted estimate.
Should you find out the probability is rather low, this implies noise willl be high (intuitively if almost every transposon is a read there is almost no noise). In that case, to dinstinguish noise from fitness effects of the transposon, you could think of increasing the duration of the growth phase to accentuate fitness effects.

Gregory94 · 2020-10-27T07:44:15Z

I saw this paper that discusses normalization using various statistical approaches, for example the negative binomial distribution. Maybe it is useful.

leilaicruz · 2020-11-06T08:18:18Z

I saw this paper that discusses normalization using various statistical approaches, for example the negative binomial distribution. Maybe it is useful.

Did you could download the paper? I could not ...

Gregory94 · 2020-11-06T08:23:34Z

I saw this paper that discusses normalization using various statistical approaches, for example the negative binomial distribution. Maybe it is useful.

Did you could download the paper? I could not ...

Dejesus2016_NORMALIZATION OF TRANSPOSON-MUTANT LIBRARY SEQUENCING DATASETS TO IMPROVE IDENTIFICATION OF CONDITIONALLY ESSENTIAL GENES.pdf

leilaicruz · 2020-11-06T08:44:04Z

Interesting that those papers: "NORMALIZATION OF TRANSPOSON-MUTANT LIBRARY SEQUENCING DATASETS TO IMPROVE IDENTIFICATION OF CONDITIONALLY ESSENTIAL GENES" and "Statistical analysis of genetic interactions in Tn-Seq
data" are from the same author Michael A. DeJesus from Department of Computer Science, Texas A&M University

leilaicruz · 2020-11-06T08:47:01Z

@Gregory94 you should watch and take a look at the repo from the same author (Michael A. DeJesus): https://github.com/mad-lab/tools
It seems very useful ....

Gregory94 · 2020-11-06T09:00:19Z

@Gregory94 you should watch and take a look at the repo from the same author (Michael A. DeJesus): https://github.com/mad-lab/tools
It seems very useful ....

Yes, indeed. But I think for many tools they created, it is optimized for their experimental setup which is different from ours. We should think whether we want to use a similar experimental approach as they had or change the tools they have and alter them for our approach.

leilaicruz · 2020-11-06T09:30:18Z

Yes they are optimized to the type of data they get and with the vision they have to analyze those datasets. However still can be useful in terms of how they implemented it and some parts of the statistical analyses could be just abstracted from their use to ours. It looks very organized at first look , and in general it is always of great benefit to have good examples of well organized and structure code from where we can learn, build and collaborate .

Wteunisse added this to To do in SATAY-analysis-workflow-board via automation Oct 16, 2020

Wteunisse added the data processing label Oct 16, 2020

Wteunisse assigned leilaicruz, Wteunisse, Gregory94 and T-Wisse Oct 16, 2020

leilaicruz changed the title ~~Things to keep in mind in preprocessing and interpreting SATAY data~~ How to interpret SATAY data in order to have meaningful information from it? Oct 16, 2020

leilaicruz moved this from To do to In progress in SATAY-analysis-workflow-board Oct 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to interpret SATAY data in order to have meaningful information from it? #27

How to interpret SATAY data in order to have meaningful information from it? #27

Wteunisse commented Oct 16, 2020

wdaalman commented Oct 20, 2020 •

edited

Wteunisse commented Oct 20, 2020

wdaalman commented Oct 23, 2020

Gregory94 commented Oct 27, 2020 •

edited

leilaicruz commented Nov 6, 2020

Gregory94 commented Nov 6, 2020

leilaicruz commented Nov 6, 2020

leilaicruz commented Nov 6, 2020

Gregory94 commented Nov 6, 2020

leilaicruz commented Nov 6, 2020 •

edited

How to interpret SATAY data in order to have meaningful information from it? #27

How to interpret SATAY data in order to have meaningful information from it? #27

Comments

Wteunisse commented Oct 16, 2020

wdaalman commented Oct 20, 2020 • edited

Wteunisse commented Oct 20, 2020

wdaalman commented Oct 23, 2020

Gregory94 commented Oct 27, 2020 • edited

leilaicruz commented Nov 6, 2020

Gregory94 commented Nov 6, 2020

leilaicruz commented Nov 6, 2020

leilaicruz commented Nov 6, 2020

Gregory94 commented Nov 6, 2020

leilaicruz commented Nov 6, 2020 • edited

wdaalman commented Oct 20, 2020 •

edited

Gregory94 commented Oct 27, 2020 •

edited

leilaicruz commented Nov 6, 2020 •

edited