-
Notifications
You must be signed in to change notification settings - Fork 0
/
methods.tex
352 lines (259 loc) · 28.3 KB
/
methods.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
\section{Materials and Methods}
\subsection{Notation}
In the Materials and Methods section, we adopt a mathematical notation for multi-dimensional arrays from the field of machine learning \citep{Chiang2021-fi}. The notation uses \textit{named axes} and incorporates implicit broadcasting of arrays when their shapes are different.
\subsection{Extracting image data}
Raw input data into Tapqir consists of 1) binder channel images ($D^\mathsf{raw}$), each $W \times H$ pixels in size, for each time point (\FIG{cosmos_experiment}B, right), and 2) lists of locations, corrected for microscope drift if necessary \citep{Friedman2015-nx}, of target molecules and of off-target control locations \citep{Friedman2015-nx} within the raw images. For simplicity we use the same notation ($x^{\mathsf{target}, \mathsf{raw}}$, $y^{\mathsf{target}, \mathsf{raw}}$) both for target molecule locations and off-target control locations. Tapqir extracts a $P \times P$ AOI around each target and off-target location and returns 1) the extracted data set $D$ consisting of a set of $P \times P$ grayscale images, collected at $N$ on-target AOI sites and $N_\mathsf{c}$ off-target AOI sites for a range of $F$ frames (\FIG{cosmos_experiment}C,D; \ALGORITHM{preprocessing}), and 2) new target (and off-target) locations ($x^\mathsf{target}$, $y^\mathsf{target}$) adjusted relative to extracted images $D$ where $x^\mathsf{target}$ and $y^\mathsf{target}$ both lie within the $(P/2-1,P/2)$ central range of the image. For the data presented in this article, we used $P = 14$. Cartesian pixel indices ($i$, $j$) are integers but also represent the center point of a pixel on the image plane. While experimental intensity measurements are integers, we treat them as continuous values in our analysis.
\input{preprocessing}
\subsection{The \emph{cosmos} model}
Our intent is to model CoSMoS image data by accounting for the significant physical aspects of image formation, such as photon noise and binding of target-specific and target-nonspecific molecules to the microscope slide surface. A graphical representation of the Tapqir model for CoSMoS data similar to that in \FIG{graphical_model}D but including probability distributions and other additional detail is shown in \FIGSUPP[graphical_model]{extended}. The corresponding generative model represented as pseudocode is shown in \ALGORITHM{model}. All variables with short descriptions and their domains are listed in \TABLE{variables}. Below, we describe the model in detail starting with the observed data and the likelihood function and then proceed with model parameters and their prior distributions.
\subsubsection{Image likelihood}
We model the image data $D$ as the sum of a photon-independent offset $\delta$ introduced by the camera and the noisy photon-dependent pixel intensity values $I$:
%
\begin{equation}
D = \delta + I
\end{equation}
In our model, each pixel in the photon-dependent image $I$ has a variance which is equal to the mean intensity $\mu^I$ of that pixel multiplied by the camera gain $g$, which is the number of camera intensity units per photon. This formulation is appropriate for cameras that use charge-coupled device (CCD) or electron-multiplier CCD (EMCCD) sensors. (The experimental CoSMoS datasets we analyzed (\TABLE{datasets}) were collected with EMCCD cameras.) It accounts for both photon shot noise and additional noise introduced by EMCCD camera amplification \citep{Van_Vliet1998-jk} and is expressed using a continuous Gamma distribution:
%
\begin{equation}
I \sim \mathbf{Gamma} (\mu^I, \sqrt{\mu^I \cdot g})
\end{equation}
The Gamma distribution was chosen because we found it to effectively model the image noise, which includes both Poissonian (shot noise) and non-Poissonian contributions. The Gamma distribution used here is parameterized by its mean and standard deviation. The functional forms of the Gamma distribution and all other distributions we use in this work are given in \TABLE{distributions}.
A competing camera technology based on scientific complementary metal-oxide semiconductor (sCMOS) sensors produces images that have also successfully been modeled as having a combination of Poissonian and non-Poissonian (Gaussian, in this case) noise sources. However, sCMOS images have noise characteristics that are considerably more complicated than CCD/EMCCD images, because every pixel has its own characteristic intensity offset, Gaussian noise variance, and amplification gain. Additional validation will be required to determine whether the existing \emph{cosmos} model requires modification or inclusion of additional prior information (e.g., pixel-by-pixel calibration data as in \cite{Huang2013-bx}) to optimize its performance with sCMOS CoSMoS data.
\subsubsection{Image model}
The idealized noise-free image $\mu^I$ is represented as the sum of a background intensity $b$ and the intensities from fluorescence spots modeled as 2-D Gaussians $\mu^S$:
%
\begin{equation}
\mu^I = b + \sum_{\mathsf{spot}} \mu^S
\end{equation}
\noindent
For simplicity we allow at most $K$ number of spots in each frame of each AOI. (In this article, we always use $K$ equal to 2.) The presence of a given spot in the image is encoded in the binary spot existence parameter $m$, where $m = 1$ when the corresponding spot is present and $m = 0$ when it is absent.
The intensities for a 2-D Gaussian spot at each pixel coordinate ($i$, $j$) is given by:
%
\begin{equation}
\mu^S_{\mathsf{pixelX}(i), \mathsf{pixelY}(j)} = \dfrac{m \cdot h}{2 \pi w^2} \exp{\left( -\dfrac{(i-x-x^\mathsf{target})^2 + (j-y-y^\mathsf{target})^2}{2 w^2} \right)}
\end{equation}
\noindent
with spot parameters total integrated intensity $h$, width $w$, and center ($x$, $y$) relative to the target (or off-target control) location ($x^\mathsf{target}$, $y^\mathsf{target}$).
%
Our primary interest is whether a target-specific spot is absent or present in a given AOI. We encode this information using a binary \emph{state} parameter $z$ with 0 and 1 denoting target-specific spot absence and presence, respectively. To indicate which of the $K$ spots is target-specific, we use the \emph{index} parameter $\theta$ which ranges from $0$ to $K$. When a target-specific spot is present ($z = 1$), $\theta \in \{1, \cdots, K \}$ specifies the index of the target-specific spot, while $\theta = 0$ indicates that no target-specific spot is present ($z = 0$). For example, $\{ m_{\mathsf{spot}(1)}=1, m_{\mathsf{spot}(2)}=1, z = 1, \theta=2 \}$ means that both spots are present and spot 2 is target-specific. A combination like $\{ m_{\mathsf{spot}(1)}=0, m_{\mathsf{spot}(2)}=1, z = 1, \theta=1 \}$ is impossible (i.e, has zero probability) since spot 1 cannot be absent and target-specific at the same time. For off-target control data, in which no spots are target-specific by definition, $z$ and $\theta$ are always set to zero.
%
\subsubsection{Prior distributions}
The prior distributions for the model parameters are summarized in \FIGSUPP[graphical_model]{extended} and detailed below. Unless otherwise indicated we assume largely uninformative priors (such as the Half-Normal distribution with large mean).
Background intensity $b$ follows a Gamma distribution:
%
\begin{equation}
b \sim \mathbf{Gamma}(\mu^b, \sigma^b)
\end{equation}
\noindent
where the mean $\mu^b \in \mathbb{R}_{>0}^{\mathsf{AOI}[N]}$ and standard deviation $\sigma^b \in \mathbb{R}_{>0}^{\mathsf{AOI}[N]}$ of the background intensity describe the irregularity in the background intensity in time and across the field of view of the microscope. Priors for $\mu^b$ and $\sigma^b$ are uninformative:
%
\begin{subequations}
\begin{align}
\mu^b &\sim \mathbf{HalfNormal}(1000) \\
\sigma^b &\sim \mathbf{HalfNormal}(100)
\end{align}
\end{subequations}
%
The target-specific presence parameter $z$ has a Bernoulli prior parameterized by the average target-specific binding probability $\pi \in [0, 1] $ for on-target AOIs and zero probability for control off-target AOIs:
%
\begin{equation}
z \sim
\begin{cases}
\mathbf{Bernoulli}(\pi) & \text{on-target AOI} \\
0 & \text{control off-target AOI} \rule{0pt}{4ex}
\end{cases}
\end{equation}
%
The prior distribution for the index of the target-specific spot $\theta$ is conditional on $z$. When no specifically bound spot is present (i.e., $z = 0$) $\theta$ always equals 0. Since spot indices are arbitrarily assigned, when the target-specific spot is present (i.e., $z = 1$) $\theta$ can take any value between $1$ and $K$ with equal probability. We represent the prior for $\theta$ as a Categorical distribution of the following form:
%
\begin{equation}
\theta \sim
\begin{cases}
0 & z = 0 \\
\mathbf{Categorical}\left( \begin{bmatrix} 0, \frac{1}{K}, \dots, \frac{1}{K} \end{bmatrix} \right) & z = 1 \rule{0pt}{4ex}
\end{cases}
\end{equation}
The average target-specific binding probability $\pi$ has an uninformative Jeffreys prior \citep{Gelman2013-ro} given by a Beta distribution:
%
\begin{equation}
\pi \sim \mathbf{Beta}(1/2, 1/2)
\end{equation}
The prior distribution for the spot presence indicator $m$ is conditional on $\theta$. When $\theta$ corresponds to spot index $k$, i.e., $\theta = k$, then $m_{\mathsf{spot}(k)} = 1$. When $\theta$ does not correspond to a spot index $k$, i.e., $\theta \neq k$, then either spot $k$ is target-nonspecific or a spot corresponding to $k$ does not exist. Consequently, for $\theta \neq k$ we assign $m_{\mathsf{spot}(k)}$ to either 0 or 1 with a probability dependent on the non-specific binding density $\lambda \in \mathbb{R}_{>0}$:
%
\begin{equation}
m_{\mathsf{spot}(k)} \sim
\begin{cases}
1 & \text{$\theta = k$} \\
\mathbf{Bernoulli} \left( \sum_{l=1}^K \dfrac{l \cdot \mathbf{TruncPoisson}(l; \lambda, K)}{K} \right) & \text{$\theta = 0$} \rule{0pt}{4ex} \\
\mathbf{Bernoulli} \left( \sum_{l=1}^{K-1} \dfrac{l \cdot \mathbf{TruncPoisson}(l; \lambda, K-1)}{K-1} \right) & \text{otherwise} \rule{0pt}{4ex}
\end{cases}
\end{equation}
The mean non-specific binding density $\lambda$ is expected to be much less than two non-specifically bound spots per frame per AOI; therefore, we use an Exponential prior of the form
%
\begin{equation}
\lambda \sim \mathbf{Exponential}(1)
\end{equation}
The prior distribution for the integrated spot intensity $h$ is chosen to fall off at a value much greater than typical spot intensity values
%
\begin{equation}
h \sim \mathbf{HalfNormal}(10000)
\end{equation}
In CoSMoS experiments the microscope/camera hardware is typically designed to set the width $w$ of fluorescence spots to a typical value in the range of 1--2 pixels \citep{Ober2015-ba}. We use a Uniform prior confined to the range between 0.75 and 2.25 pixels:
%
\begin{equation}
w \sim \mathbf{Uniform}(0.75, 2.25)
\end{equation}
Priors for spot position ($x$, $y$) depend on whether the spot represents target-specific or non-specific binding. Non-specific binding to the microscope slide surface can occur anywhere within the image and therefore has a uniform distribution (\FIGSUPP[graphical_model]{xy}, red). Spot centers may fall slightly outside the AOI image yet still affect pixel intensities within the AOI. Therefore the range for ($x$, $y$) is extended one pixel wider than the size of the image, which allows a spot center to fall slightly beyond the AOI boundary.
In contrast to non-specifically bound molecules, specifically bound molecules are colocalized with the target molecule with a precision that can be smaller than one pixel and that depends on various factors including the microscope point-spread function and magnification, accuracy of registration between binder and target image channels, and accuracy of drift correction. For target-specific binding, we use an Affine-Beta prior with zero mean position relative to the target molecule location ($x^\mathsf{target}$, $y^\mathsf{target}$), and a ``proximity'' parameter $\sigma^{xy}$ which is the standard deviation of the AffineBeta distribution (\FIGSUPP[graphical_model]{xy}, green). We chose the Affine-Beta distribution because it models a continuous parameter defined on a bounded interval.
%
\begin{equation}
x_{\mathsf{spot}(k)}, y_{\mathsf{spot}(k)} \sim
\begin{cases}
\mathbf{AffineBeta}\left( 0, \sigma^{xy}, -\dfrac{P+1}{2}, \dfrac{P+1}{2} \right) & \theta = k ~\text{(target-specific)} \\
\mathbf{Uniform}\left(-\dfrac{P+1}{2}, \dfrac{P+1}{2} \right) & \theta \neq k ~\text{(target-nonspecific)} \rule{0pt}{4ex}
\end{cases}
\end{equation}
We give the proximity parameter $\sigma^{xy}$ a diffuse prior, an Exponential with a characteristic width of one pixel:
%
\begin{equation}
\sigma^{xy} \sim \mathbf{Exponential}(1)
\end{equation}
Tests on data simulated with increasing proximity parameter values $\sigma^{xy}$ (true) (i.e., with decreasing precision of spatial mapping between the binder and target image channels) confirm that the \emph{cosmos} model accurately learns $\sigma^{xy}$ (fit) from the data (\FIGSUPP[tapqir_analysis]{randomized}D; \TABLE{proximity}). This was the case even if we substituted a less-informative $\sigma^{xy}$ prior (Uniform vs. Exponential; \TABLE{proximity}).
The CoSMoS technique is premised on colocalization of the binder spots with the known location of the target molecule. Consequently, for any analysis method, classification accuracy declines when the images in the target and binder channels are less accurately mapped. For the Tapqir \emph{cosmos} model, low mapping precision has little effect on classification accuracy at typical non-specific binding densities ($\lambda = 0.15$; see MCC values in \TABLE{proximity}).
Gain $g$ depends on the settings of the amplifier and electron multiplier (if present) in the camera. It has a positive value and is typically in the range between 5--50. We use a Half-Normal prior with a broad distribution encompassing this range:
%
\begin{equation}
g \sim \mathbf{HalfNormal}(50)
\end{equation}
The prior distribution for the offset signal $\delta$ is empirically measured from the output of camera sensor regions that are masked from incoming photons. Collected data from these pixels are transformed into a density histogram with intensity step size of 1. The resulting histogram typically has a long right hand tail of low density. For computational efficiency, we shorten this tail by binning together pixel intensity values from the upper 0.5\% percentile. Since $D = \delta + I$ (Eq. 1) and photon-dependent intensity $I$ is positive, all $D$ values have to be larger than the smallest offset intensity value. If that is not the case we add a single value $\min(D) - 1$ to the offset empirical distribution which has a negligible effect on the distribution. Bin values $\delta_\mathsf{samples}$ and their weights $\delta_\mathsf{weights}$ are used to construct an Empirical prior:
%
\begin{equation}
\delta \sim \mathbf{Empirical}(\delta_\mathsf{samples}, \delta_\mathsf{weights})
\end{equation}
All simulated and experimental data sets in this work were analyzed using the prior distributions and hyperparameter values given above, which are compatible with a broad range of experimental conditions (\TABLE{datasets}). Many of the priors are uninformative and we anticipate that these will work well with images taken on variety of microscope hardware. However, it is possible that highly atypical microscope designs (e.g., those with effective magnifications that are sub-optimal for CoSMoS) might require adjustment of some fixed hyperparameters and distributions (those in Eqs. 6a, 6b, 11, 12, 13, 15, and 16). For example, if the microscope point spread function is more than 2 pixels wide, it may be necessary to increase the range of the $w$ prior in Eq. 13. The Tapqir documentation (\url{https://tapqir.readthedocs.io/en/stable/}) gives instructions for changing the hyperparameters.
\subsection{Joint distribution}
The joint distribution of the data and all parameters is the fundamental distribution necessary to perform a Bayesian analysis. Let $\phi$ be the set of all model parameters. The joint distribution can be expressed in a factorized form:
%
\begin{equation}
\begin{aligned}
p(D, \phi) =~&p(g) p(\sigma^{xy}) p(\pi) p(\lambda) \prod_{\mathsf{AOI}} \left[ p(\mu^b) p(\sigma^b) \prod_{\mathsf{frame}} \left[ \vphantom{\prod_{F}} p(b | \mu^b, \sigma^b) p(z | \pi) p(\theta | z) \vphantom{\prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}}} \cdot \right. \right. \\
&\prod_{\mathsf{spot}} \left[ \vphantom{\prod_{F}} p(m | \theta, \lambda) p(h) p(w) p(x | \sigma^{xy}, \theta) p(y | \sigma^{xy}, \theta) \right] \left. \left. \prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}} p(\delta) p(D | \mu^I, g, \delta) \right] \right]
\end{aligned}
\end{equation}
The Tapqir generative model is a stochastic function that describes a properly normalized joint distribution for the data and all parameters (\ALGORITHM{model}). In Pyro this is called ``the model''.
\input{model}
\subsection{Inference}
For a Bayesian analysis, we want to obtain the posterior distribution for parameters $\phi$ given the observed data $D$. There are three discrete parameters $z$, $\theta$, and $\delta$ that can be marginalized out exactly so that they do not appear expilictly in either the joint posterior distribution or the likelihood function. Computationally efficient marginalization is implemented using Pyro's enumeration strategy \citep{Obermeyer2019-xt} and KeOps' kernel operations on the GPU without memory overflows \citep{Charlier2021-vq}. Let $\phi^{\prime} = \phi - \{ z, \theta, \delta \}$ be the rest of the parameters. We obtain posterior distributions of $\phi^{\prime}$ using Bayes' rule:
%
\begin{equation}
p(\phi^{\prime} | D) =
\dfrac{\sum_{z, \theta, \delta} p(D, \phi)}{\int_{\phi} p(D, \phi) d\phi} =
\dfrac{p(D, \phi^{\prime})}{\int_{\phi} p(D, \phi) d\phi} =
\dfrac{p(D| \phi^{\prime} )p(\phi^{\prime})}{\int_{\phi} p(D, \phi) d\phi}
\end{equation}
Note that the integral in the denominator of this expression is necessary to calculate the posterior distribution, but it is usually analytically intractable. However, variational inference provides a robust method to approximate the posterior distribution $p(\phi^{\prime} | D)$ with a parameterized variational distribution $q(\phi^{\prime})$ \citep{Bishop2006-oa}.
%
\begin{equation}
p(\phi^{\prime} | D) \simeq q(\phi^{\prime})
\end{equation}
$q(\phi^{\prime})$ has the following factorization:
\begin{equation}
\begin{aligned}
q(\phi^{\prime}) =~&q(g) q(\sigma^{xy}) q(\pi) q(\lambda) \cdot \\
&\prod_{\mathsf{AOI}} \left[ q(\mu^b) q(\sigma^b) \prod_{\mathsf{frame}} \left[ q(b) \prod_{\mathsf{spot}} \left[ \vphantom{\prod_{F}} q(m) q(h | m) q(w | m) q(x | m) q(y | m) \right] \right] \right]
\end{aligned}
\end{equation}
The variational distribution $q(\phi^{\prime})$ is provided as pseudocode for a generative stochastic function (\ALGORITHM{guide}). In Pyro this is called ``the guide''. Variational inference is sensitive to initial values of variational parameters. In \ALGORITHM{guide}, step 1 we provide the initial values of variational parameters used in our analyses.
\input{guide}
\subsection{Calculation of spot probabilities}
Variational inference directly optimizes $q(m) \equiv m_\mathsf{prob}$ (see Eq. 21 and \ALGORITHM{guide}), which approximates $p(m | D)$. To obtain the marginal posterior probabilities $p(z, \theta | D)$ we use a Monte Carlo sampling method:
\begin{equation}
\begin{aligned}
p(z, \theta | D) &= \int_{\phi^{\prime}} p(z, \theta, \phi^{\prime} | D) d\phi^{\prime} \\
&= \int_{\phi^{\prime}} p(z, \theta | \phi^{\prime}) p(\phi^{\prime} | D) d\phi^{\prime} \\
&\simeq \int_{\phi^{\prime}} p(z, \theta | \phi^{\prime}) q(\phi^{\prime}) d\phi^{\prime} \\
&\simeq \dfrac{1}{S} \sum_{s=1}^{S} p(z, \theta | \phi^{\prime}_s) \quad \text{where} \quad \phi^{\prime}_s \sim q(\phi^{\prime})
\end{aligned}
\end{equation}
In our calculations we used $S = 25$ number of Monte Carlo samples. Marginal probabilities $p(z | D)$ and $p(\theta | D)$ are calculated as:
\begin{subequations}
\begin{align}
p(z | D) &= \sum_{\theta} p(z, \theta | D) \\
p(\theta | D) &= \sum_{z} p(z, \theta | D)
\end{align}
\end{subequations}
The probability, $p(\mathsf{specific})$, that a target-specific fluorescence spot is present in a given image by definition is:
\begin{equation}
p(\mathsf{specific}) \equiv p(z = 1 | D)
\end{equation}
For simplicity in the main text and figures we suppress the conditional dependency on $D$ in $p(\theta | D)$ and $p(m | D)$ and instead write them as $p(\theta)$ and $p(m)$, respectively.
\subsection{Tapqir implementation}
The model and variational inference method outlined above are implemented as a probabilistic program in the Python-based probabilistic programming language (PPL) Pyro \citep{Foerster2018-kd,Bingham2019-qy,Obermeyer2019-xt}. We use a variational approximation because exact inference is not analytically tractable for a model as complex as \emph{cosmos}. As currently implemented in Pyro, variational inference is significantly faster than Monte Carlo inference methods. In Tapqir, the objective that is being optimized is the evidence lower bound (ELBO) estimator that provides unbiased gradient estimates upon differentiation. At each iteration of inference procedure we choose a random subset of AOIs and frames (mini-batch), compute a differentiable ELBO estimate based on this mini-batch and update the variational parameters via automatic differentiation. We use PyTorch's Adam optimizer \citep{Kingma2014-cz} with the learning rate of $5\times 10^{-3}$ and keep other parameters at their default values.
\subsection{Credible intervals and confidence intervals}
Credible intervals were calculated from posterior distribution samples as the highest density region (HDR), the narrowest interval with probability mass 95\% using the \texttt{pyro.ops.stats.hpdi} Pyro function. Confidence intervals were calculated from bootstrap samples as the 95\% HDR.
\subsection{Data simulation}
Simulated data were produced using the generative model (\ALGORITHM{model}). Each simulation has a subset of parameters ($\pi$, $\lambda$, $g$, $\sigma^{xy}$, $b$, $h$, $w$, $\delta$) set to desired values while the remaining parameters ($z$, $\theta$, $m$, $x$, $y$) and resulting noisy images ($D$) are sampled from distributions. The fixed parameter values and data set sizes for all simulations are provided in Supplementary File 1--6.
For kinetic simulations (\FIG{kinetic_analysis}, Supplementary File 5), $z$ was modeled using a discrete Markov process with the initial probability and the transition probability matrices:
\begin{subequations}
\begin{align}
p(z_{\mathsf{frame}(0)} | k_\mathsf{on}, k_\mathsf{off}) &= \mathbf{Categorical} \left( \begin{bmatrix} \frac{k_\mathsf{off}}{k_\mathsf{on} + k_\mathsf{off}} & \frac{k_\mathsf{on}}{k_\mathsf{on} + k_\mathsf{off}} \end{bmatrix} \right) \\
p(z_{\mathsf{frame}(f)} | z_{\mathsf{frame}(f-1)}, k_\mathsf{on}, k_\mathsf{off}) &= \mathbf{Categorical} \left( \begin{bmatrix} 1 - k_\mathsf{on} & k_\mathsf{on} \\ k_\mathsf{off} & 1 - k_\mathsf{off} \end{bmatrix} \right)
\end{align}
\end{subequations}
\noindent
where $k_{\mathsf{on}}$ and $k_{\mathsf{off}}$ are transition probabilities that numerically approximate the pseudo-first-order binding and first-order dissociation rate constants in units of $\mathsf{s}^{-1}$, respectively, assuming 1 s/frame. We assumed that the Markov process is at equilibrium and initialized the chain with the equilibrium probabilities.
\subsection{Posterior predictive sampling}
For posterior predictive checking, sampled images ($\widetilde{D}$) were produced using Tapqir's generative model (\ALGORITHM{model}) where model parameters were sampled from the posterior distribution $p(\phi|D)$, which was approximated by the variational distribution $q(\phi)$:
\begin{equation}
\begin{aligned}
\widetilde{D} \sim p(\widetilde{D} | D) &= \int_\phi p(\widetilde{D} | \phi) p(\phi | D) d\phi \\
&\simeq \int_\phi p(\widetilde{D} | \phi) q(\phi) d\phi
\end{aligned}
\end{equation}
\subsection{Signal-to-noise ratio}
We define SNR as:
\begin{equation}
\mathsf{SNR} = \mathsf{mean} \left( \dfrac{\mathsf{signal}}{\sqrt{\sigma^2_{\mathsf{offset}} + \sigma^2_{\mathsf{background}}}} \right)
\end{equation}
where $\sigma^2_{\mathsf{background}} = b \cdot g$ the variance of the background intensity, $\sigma^2_{\mathsf{offset}}$ the variance of the offset intensity, and the mean is taken over all target-specific spots. For experimental data, $\mathsf{signal}$ is calculated as
\begin{equation}
\mathsf{signal} = \sum_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}} \left( D - b_{\mathsf{mean}} - \delta_\mathsf{mean} \right) \mathsf{weight}
\end{equation}
where $\mathsf{weight}$ is
\begin{equation}
\mathsf{weight} = \dfrac{1}{2 \pi \cdot w^2} \exp{\left( -\dfrac{(i-x-x^\mathsf{target})^2 + (j-y-y^\mathsf{target})^2}{2 \cdot w^2} \right)}
\end{equation}
For simulated data theoretical $\mathsf{signal}$ is directly calculated as:
\begin{equation}
\mathsf{signal} = \sum_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}} h \cdot \mathsf{weight}^2
\end{equation}
\subsection{Classification accuracy statistics}
As a metric of classification accuracy we use three commonly used statistics -- recall, precision, and Matthews Correlation Coefficient \citep{Matthews1975-rw}
\begin{equation}
\mathrm{Recall} = \dfrac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FN}}
\end{equation}
\begin{equation}
\mathrm{Precision} = \dfrac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}}
\end{equation}
\begin{equation}
\mathrm{MCC} =
\dfrac{\mathrm{TP} \cdot \mathrm{TN} - \mathrm{FP} \cdot \mathrm{FN}}
{\sqrt{(\mathrm{TP} + \mathrm{FP}) (\mathrm{TP} + \mathrm{FN}) (\mathrm{TN} + \mathrm{FP}) (\mathrm{TN} + \mathrm{FN})}}
\end{equation}
\noindent
where TP is true positives, TN is true negatives, FP is false positives, and FN is false negatives.
\subsection{Kinetic and thermodynamic analysis}
To estimate simple binding/dissociation kinetic parameters (\FIG{kinetic_analysis}C,D), we sample binary time records $z$ from the inferred $p(\mathsf{specific})$ time records for all AOIs. For a two-state hidden Markov model, the maximum-likelihood estimates of $k_\mathsf{on}$ and $k_\mathsf{off}$ are given by:
\begin{equation}
\hat{k}_\mathsf{on}, \hat{k}_\mathsf{off} = \argmax_{k_\mathsf{on}, k_\mathsf{off}} \prod_\mathsf{AOI} \left[ p(z_{\mathsf{frame}(0)} | k_\mathsf{on}, k_\mathsf{off}) \prod_{f=1}^{F-1} p(z_{\mathsf{frame}(f)} | z_{\mathsf{frame}(f-1)}, k_\mathsf{on}, k_\mathsf{off}) \right]
\end{equation}
Repeating this procedure 2,000 times gave the distributions of $k_\mathsf{on}$ and $k_\mathsf{off}$ from which we compute mean and 95\% credible interval.
Similarly, to estimate mean and 95\% CI of $K_\mathsf{eq}$ (\FIG{kinetic_analysis}E) we sampled $\pi$ from $q(\pi)$ and for each sampled value of $\pi$ calculated $K_\mathsf{eq}$ as:
\begin{equation}
K_\mathsf{eq} = \dfrac{\pi}{1 - \pi}
\end{equation}
To calculate time-to-first binding kinetics from the Tapqir-derived $p(\mathsf{specific})$ (\FIG{experimental_data}B, \FIGSUPP[experimental_data]{DatasetA}B, \FIGSUPP[experimental_data]{DatasetC}B, and \FIGSUPP[experimental_data]{DatasetD}B), 2,000 binary time records $z$ were sampled from the $p(\mathsf{specific})$ time record for each AOI. For each sampled time record initial absent intervals were measured and analyzed using Eq. (7) in \cite{Friedman2015-nx}, yielding distributions of $k_\mathsf{a}$, $k_\mathsf{ns}$, and $A_\mathsf{f}$. Mean value and 95\% credible intervals were calculated from these distributions. Initial absent intervals from ``spot-picker'' analysis (\FIG{experimental_data}C, \FIGSUPP[experimental_data]{DatasetA}C, \FIGSUPP[experimental_data]{DatasetC}C, and \FIGSUPP[experimental_data]{DatasetD}C) were analyzed as described in \citep{Friedman2015-nx}, except that on-target and off-target data were here analyzed jointly instead of being analyzed sequentially \citep{Friedman2015-nx}. Note that the $k_\mathsf{ns}$ values determined using the two methods are not directly comparable for several reasons, including that the non-specific binding frequencies are effectively measured over different areas. For Tapqir the target area is approximately $ \pi \left( \sigma^{xy} \right) ^2$ (which is between 0.3 and 0.8 pixels$^2$ in the different experimental data sets) and for spot-picker the area is subjectively chosen as $\pi \cdot 1.5^2 = 7$ pixels$^2$.
\include{tables/variables}
\input{tables/distributions}
\input{tables/proximity}
\clearpage