report.tex

\documentclass{article}
\usepackage{mathtools}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsfonts}
\usepackage{graphicx}
\usepackage{float}
\usepackage{multirow}
\usepackage{verbatim}
\usepackage[]{algorithm2e}
\usepackage{hyperref}

\linespread{1.3}
\setlength{\parindent}{0em}
\setlength{\parskip}{1em}
\setcounter{secnumdepth}{0}
\setcounter{MaxMatrixCols}{20}
\renewcommand{\arraystretch}{1.5}

\newcommand{\ts}{\textsuperscript}
\newcommand{\diff}{\mathop{}\!\mathrm{d}}
\newcommand{\prob}{\mathbb{P}}
\newcommand{\expect}{\mathbb{E}}
\newcommand{\var}{\text{Var}}

\DeclarePairedDelimiter{\abs}{\lvert}{\rvert}
\DeclarePairedDelimiter\norm{\lVert}{\rVert}
\DeclarePairedDelimiter\p{\lparan}{\rparan}

\title{Lilypond}
\author{Joshua Hwang}

\begin{document}
\section{Ideas for extensions}
We will not describe the Lilypond model here and it is assumed the
reader has a brief understanding of it.

The extensions we intended to introduce were the non-homogeneous
placement which was achieved using thinning.
A negative of this method is that we need a maximum which we initially
give an upper bound with.

Another extension was the changing of rates. We had previously had
each grain growing at equal rates. We now produce a random rate
(currently) based off independent uniform distributions.

Finally we implement Ebert and Last (2015) extension of random birth
times. The grains develop at different times in the cycle based
(currently) off independent uniform distributions. If the grain
begins growing \emph{after} another grain has grown on top it does not
get realised.

From these extensions we define the growing radii to be,
\begin{align*}
    r_i
    &=
    \begin{cases}
        v_i \times (t - t_{i0}) \quad &[t_{i0}, \infty) \\
        0 &\text{otherwise} \\
    \end{cases}
\end{align*}

With $D$ being the distance between the grains,
the question of collision time becomes,
\begin{align*}
    0 &= |D - r_j - r_k| \qquad [\max\{t_{j0}, t_{k0}\}, \infty) \\
    0 &= D - r_j - r_k \\
    &= D - v_j \times (t - t_{j0}) - v_k \times (t - t_{k0}) \\
    D &= v_j \times (t - t_{j0}) + v_k \times (t - t_{k0}) \\
    D &= v_j \times t - v_j \times t_{j0} + v_k \times t - v_k \times t_{k0} \\
    D + v_j t_{j0} + v_k t_{k0} &= (v_j + v_k)t \\
    t &= \frac{D + v_j t_{j0} + v_k t_{k0}}{v_j + v_k} \\
\end{align*}

Where $t$ is the global time.

This is only for the case where two grains are growing. In the case where the
one of the grains ($k$) is stationary we have,
\begin{align*}
    0 &= |D - r_j - r_k| \qquad [t_{j0}, \infty) \\
    0 &= D - v_j \times (t - t_{j0}) - r_k \\
    0 &= D - v_j t + v_j t_{j0} - r_k \\
    t &= \frac{D + v_j t_{j0} - r_k}{v_j} \\
\end{align*}

One issue that came up while working on this is determining if a collision will
ever occur. Keep in mind that our model allows a growing grain to smother an
unborn one.

A growing grain ($j$) will smother an unborn one ($k$) if the time of
smothering $t_{\text{smother}}$ occurs before the time $k$ gets born
then we know that grain will die.
\begin{align*}
    D &= v_j (t_{\text{smother}} - t_j) \\
    \frac{D}{v_j} &= t_{\text{smother}} - t_j \\
    t_{\text{smother}} &= \frac{D}{v_j} + t_j \\
    \text{thus} \\
    t_{\text{smother}} < t_k
\end{align*}

I first implemented the model in Python, \texttt{sim.py},
this was because speed was initially
not considered a factor. The program runs in $O(n^4)$ time where
$n$ is the number of points.

A second model was created under the folder \texttt{attempt2}. This new model
was partially written in C was written around the piping paradigm
present in Unix-like systems. Creating a format with the file extension
\texttt{.lp}. The program is split into three main parts; generating points,
solving the system and analysing the data.

The programs that
can generate points are currently still being done in Python and
is able to generate 1000s of points in a few seconds. The implementation,
\texttt{genlp.py}, is currently running in linear time. A second implementation
(which was more for fun) \texttt{imglp.py} is able to generate points as if the
image is the joint pdf we are sampling from. These programs generate an
unfinished \texttt{.lp} file which gets sent to a solver program.

The second part is aimed at solving a system represented by the \texttt{.lp}
files. This part of the pipeline was implemented as a C program. It still
runs in $O(n^3)$ time with $O(n^3)$ memory but making a few modifications has
greatly reduced parts of the computation. It can be improved by culling
checks and calculations which are guaranteed not to occur (dropping the algorithm
to $O(n^2)$ time.

The final part involves reading and analysing the data, currently we are
seeing the final system using \texttt{matplotlib}, a python library, to animate
the circles. We can also analyse the system for some quantitative features.

\subsection{Algorithm detail}
To speed up the algorithm we note the following properties.

We recognise that given any two grains the fastest collision time
will occur when they're both growing. If one of the grains stops growing
prematurely (from colliding with another grain) then the collision time
will increase.

Because of this we define two types of pairs, a growing pair is where
both grains are growing while a stationary pair is one where one of the
grains is stationary. If both grains are stationary then the pair is
useless and such a case is discarded.

We also recognise that a growing pair can evaluate both grains OR
turn into a stationary pair. In comparison, a stationary pair can only
evaluate it's remaining growing grain.
Because of this we will store each grains best stationary pair in
the grain data structure.

\begin{algorithm}[H]
    \SetKwInOut{Input}{Input}\SetKwInOut{Output}{Output}

    \Input{Grains locations, velocities and birth times given}
    Generate all combinations of pairs of grains and put them in $pairs$
    (\tcp*[f]{$|pairs| = \frac{n(n-1)}{2}$})\;
    Calculate the estimated collision times of each pair
    (as explained later,
    with a few modifications this process can actually be reduced
    to a constant number of pairs)\;
    \While{Grains are still growing}{
        Search for pair with shortest collision time\;
        Let $g1$ and $g2$ represent both grains in the pair\;
        \If{$g1$ is growing}{
            set the radius of $g1$ to what was estimated\;
        }
        \If{$g2$ is growing}{
            set the radius of $g2$ to what was estimated\;
        }
        \For(\tcp*[f]{With an appropriate data structure this is very quick})
        {$pair$ in $pairs$ that contain $g1$}{
            Let the elements of the pair be called $g1$ and $g_{other}$\;
            Recalculate the collision time of this $pair$\;
            \eIf{new collision time is better than $g_{other}$'s best
            stationary pair}{
                Remove $g_{other}$'s best stationary pair from $pairs$\;
                Set $g_{other}$'s best stationary pair to $pair$\;
            }{
                Remove $pair$ from $pairs$\;
            }
        }
        Repeat process for $g2$\;
    }
    \caption{Algorithm for \texttt{solvelp}}
\end{algorithm}

We can actually improve our model by culling all pairs outside a given (and
known range) i.e. collisions that are further away than the nearest grain.
Not sure if this is just a constant in speed or if it drops from $n^3$ to
$n^2$. This is because we're restricting the number of
pairs to a constant(?) circle of influence.

\section{Experiments}
I have decided to use the model to study where the point of percolation is
based on the extended model developed by Penrose \& Last (2011).

My findings are percolation occurs (no matter the area size) at a little over
1.2 units in length. The data can be found in a Google Sheets which I hopefully
haven't destroyed
\url{https://docs.google.com/spreadsheets/d/1mh4FgiSejmOBHi7JJxv_7j2LBOEXD74deOvOaGpa6d4/edit?usp=sharing}.

The algorithm I have developed was independent of the Heveling \& Last (2005)
paper which claimed to develop a far more effective algorithm for the standard
model. Unlike my own algorithm their worst case, equal to mine,
and average case are different.
After implementing a comparable version (the algorithm does not cull impossible
comparisons) to my own algorithm I tested each solver
on the same randomly generated system to properly compare them. My results found
their average case was a whole factor less than their worst case! Given the
improvement that would cull the number of comparisons as well the 
Heveling \& Last algorithm could run in $O(n)$ time on average.

Data:
\url{https://docs.google.com/spreadsheets/d/1Gynb3_ZZTF2Iw2vTAQ49YPa2zHT_xmQqbWLCi7JNz30/edit?usp=sharing}

\section{Further questions}
The data shows that for the Heveling \& Last algorithm the number of iterations
required is only 7-8. But the worst case claim says the number of iterations should
be the number of grains. Why is it the case that the number of iterations is
a constant?

The percolation data was obtained by increasing the area grains could grow in and
not changing the intensity of the Poisson point process (always constant 1).
How does the percolation threshold change as we increase/decrease the intensity?

Is there an analytical way to derive the results we've gotten empirically.

\end{document}