Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a guide for working on internals #1100

Merged
merged 17 commits into from
Feb 15, 2018
342 changes: 342 additions & 0 deletions guides/internals.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,342 @@
===================================
How to Work on Hypothesis Internals
===================================

This is a guide to how to work on Hypothesis internals,
with a particular focus on helping people who are new to it.
Right now it is very rudimentary and is intended primarily for people who are
looking to get started writing shrink passes as part of our `current outreach
program to get more people doing that <https://github.com/HypothesisWorks/hypothesis-python/issues/1093>`_,
but it will expand over time.

------------------------
Bird's Eye View Concepts
------------------------

The core engine of Hypothesis is called Conjecture.

The "fundamental idea" of Conjecture is that you can represent an arbitrary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reverse these, so we use a PRNG and view that as bytes:

The "fundamental idea" of Conjecture is that you can represent an arbitrary randomized test case as a sequence of outputs from some pseudo-random number generator (PRNG), which in turn can be viewed as a string of bytes representing the underlying entropy. Whenever you want to do something "random", you read the next bytes and do what they tell you to do.

Regardless, delete the word "basically".

randomized test case as a string of bytes, which are basically intended as the
underlying entropy of some pseudo-random number generator (PRNG).
Whenever you want to do something "random" you read the next bytes and
do what they tell you to do. By manipulating these bytes, we can achieve
more interesting effects than pure randomness would allow us to do, while
retaining the power and ease of use of random testing.

The idea of shrinking in particular is that once we have this representation,
we can shrink arbitrary test cases based on it. We try to produce a string that
is *shortlex minimal*. What this means is that it has the shortest possible
length and among those strings of minimal length is lexicographically (i.e. the
normal order on strings - find the first byte at which they differ and use that
Copy link

@ghost ghost Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typographical nitpicks, em-dash (—) for punctuation dash, also on lines 188 & 338, comma after i.e.

Copy link
Member Author

@DRMacIver DRMacIver Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to annoying technical constraints (because everything is awful despite it being 2018 my console is not unicode-sensible, I'm unwilling to write this as &mdash; and Github doesn't render RST em-dashes correctly) they're not going to be em-dashes. 😢

Elaborate on comma after i.e.? Is that a thing I'm supposed to be doing?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we all chip in and buy you a proper terminal? 😄 Re i.e., I had that beaten into me from childhood, but a quick snout around reveals it is mandatory for US English (if the Chicago manual of style is your Bible, it is mine), but optional in BE (Fowler); so I have been deceived by yankee cultural imperialism. I'll continue to use though, since the comma indicates the natural spoken pause that you would use when replacing it with "that is".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we all chip in and buy you a proper terminal?

Unfortunately due to a complex interlocking series of least worst solutions and bad life choices, getting me a better terminal would require getting me a better computer (I am currently working inside the Windows Ubuntu subsystem, and all of the terminal options for Windows are awful).

to decide) smallest.

Ideally we could think of the shrinker is a generic function that takes a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 'as a generic function'

string satisfying some predicate and returns the shortlex minimal string that
also satisfies it.
This is wrong on several levels: The first is that we only succeed in approximating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The implementation departs from this ideal in two ways:" ...

(IMO it's not wrong per se, just, well, idealised)

such a minimal string. The second is that we are only interested in minimizing
things where the predicate goes through the Hypothesis API, which lets us track
a lot of info about how the data is used and use that to guide the process.

We then use a number of different transformations of the string to try and
reduce our input. These vary from principled general transformations to shameless
hacks that special case something we need to work well. We try to aim for mostly
the former, but the nice thing about this model is that the underlying representation
is fully general and we are free to try whatever we want and it will never result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence rambles a bit. How about:

...something we need to work well. The nice thing about this model is that the underlying representation of test-as-bytestring is fully general, so we can use any technique at all to minimise the string without inherent correctness problems. Principled transformations are usually more general, but special-case hacks are only a problem to the degree that they result in messy code and fragile heuristics, so if we need one for something it's not a big deal.

in us doing the wrong thing, so hacks are only a problem to the degree that they
result in messy code and fragile heuristics, they're never a correctness issue,
so if we can't make something work without such a hack it's not a big deal.

One such example of a hack is the handling of floating point numbers. There are
a couple of lexicographic shrinks that are always valid but only really make
sense for our particular encoding of floats. We check if we're working
on something that is of the right size to be a float and apply those
transformations regardless of whether it is actually meant to be a float.
Worst case scenario it's not a float and they don't work, and we've run a few
extra test cases.

--------------------------
Useful Files to Know About
--------------------------

The code associated with Conjecture lives in
`src/hypothesis/internal/conjecture <https://github.com/HypothesisWorks/hypothesis-python/tree/master/src/hypothesis/internal/conjecture>`_.
There are a number of files in there,
but the most important ones are ``engine.py`` and ``data.py``.
``data.py`` defines the core type that is used to represent test cases,
and ``engine.py`` contains the main driver for deciding what test cases to run.

There is also ``minimizer.py``, which contains a general purpose lexicographic
minimizer. This is responsible for taking some byte string and a predicate over
byte strings and producing a string of the same length which is lexicographically
smaller. Unlike the shrinker in general, this *is* supposed to work on arbitrary
predicates and doesn't know anything about the testing API. We typically apply
this to subsets of the bytes for a test input with a predicate that knows how
to integrate those subsets into a larger test. This is the part of the code
that means we can do things like replacing an integer with a smaller one.

-------
Testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be moved to testing-hypothesis.rst, and just leave a reference here.

We should also explain how to install the test dependencies.

-------

The Hypothesis test suite is rather large, but there are a couple of areas in
particular that are useful to know about when making engine changes.

The first is `tests/cover/test_conjecture_engine.py <https://github.com/HypothesisWorks/hypothesis-python/blob/master/tests/cover/test_conjecture_engine.py>`_,
which is a set of unit tests designed to put the engine into particular scenarios to exercise specific behaviours,
with a goal of achieving 100% coverage on it in isolation (though it currently does not quite achieve that for some specific edge cases.
We may fix and enforce this later).

The other set of tests that are worth knowing about are the quality tests,
in `tests/quality <https://github.com/HypothesisWorks/hypothesis-python/tree/master/tests/quality>`_.
These assert specific hard to satisfy properties about the examples that Hypothesis finds -
either their existence, or something about the final shrunk result.

To run a specific test file manually, you can use pytest. I usually use the
following invocation:

.. code-block::

python -m pytest tests/cover/test_conjecture_engine.py

You will need to have Hypothesis installed locally to run these. I recommend a
virtualenv where you have run ``pip install -e .``, which installs all the
dependencies and puts your ``src`` directory in the path of installed packages
so that edits you make are automatically pipped up.

Useful arguments you can add to pytest are ``-n 0``, which will disable build
parallelism (I find that on my local laptop the startup time is too high to be
worth it when running single files, so I usually do this), and ``-kfoo`` where
foo is some substring common to the set of tests you want to run (you can also
use composite expressions here. e.g. ``-k'foo and not bar'`` will run anything
containing foo that doesn't also contain bar).

-----------------------
Engine Design Specifics
-----------------------

There are a couple of code patterns that are mostly peculiar to Conjecture that
you may not have encountered before and are worth being aware of.

~~~~~~~~~~~~~~~~~~~~
Search State Objects
~~~~~~~~~~~~~~~~~~~~

There are a number of cases where we find ourself with a user-provided function
(where the "user" might still be something that is entirely our code) and we
want to pass a whole bunch of different examples to it in order to achieve some
result. Currently this includes each of the main engine, the Shrinker (in
engine.py) and the minimizer, but there are likely to be more in future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: monospace engine.py


We typically organise such things in terms of an object that you create with
the function and possibly an initial argument that stores these on self and
has some ``run`` or similar method. They then run for a while, repeatedly
calling the function they were given.

Generally speaking they do not call the function directly, but instead wrap
calls to it. This allows them to implement a certain amount of decision caching,
e.g. avoiding trying the same shrink twice, but also gives us a place where we
can update metadata about the search process.

For objects whose goal is some form of optimisation (Shrinker, Minimizer) one
of the pieces of metadata they will typically track is a "current target". This
is typically the best example they have seen so far. By wrapping every call to
the predicate, we ensure that we never miss an example even when we're passing
through other things.

For objects whose goal is some broader form of search (currently only
``ConjectureRunner``) this also allows them to keep track of *other* examples
of interest. For example, as part of our multiple bug discovery,
``ConjectureRunner`` keeps track of the smallest example of each distinct
failure that it has seen, and updates this automatically each time the test
function is called. This means that if during shrinking we "slip" and find a
different bug than the one we started with, we will *not* shrink to that, but
it will get remembered by the runner if it was either novel or better than our
current example.

~~~~~~~~~~~
Weird Loops
~~~~~~~~~~~

The loops inside a lot of the engine look very strange and unidiomatic. For
example:

.. code-block:: python

i = 0
while i < len(self.intervals):
u, v = self.intervals[i]
if not self.incorporate_new_buffer(
self.shrink_target.buffer[:u] + self.shrink_target.buffer[v:]
):
i += 1


The more natural way to write this in Python would be:

.. code-block:: python

for u, v in self.intervals:
self.incorporate_new_buffer(
self.shrink_target.buffer[:u] + self.shrink_target.buffer[v:]
)

This way of writing the loop would be *entirely wrong*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, in this case the idiomatic loop is not equivalent.


Every time ``incorporate_new_buffer`` succeeds, it changes the shape of the
current shrink target. This consequently changes the shape of intervals, both
its particular values and its current length - on each loop iteration the loop
might stop either because ``i`` increases or because ``len(self.intervals)``
decreases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a note that starting a fresh loop each time self.intervals changes would have quadratic rather than linear complexity.


An additional quirk is that we only increment ``i`` on failure. The reason for
this is that if we successfully deleted the current interval then the interval
in position ``i`` has been replaced with something else, which is probably the
next thing we would have tried deleting if we hadn't succeeded (or something
like it), so we don't want to advance past it.
This is specific to deletion: If we are just replacing the contents of
something then we expect it to still be in the same place, so there we increment
unconditionally.
Examples of this include ``zero_draws`` and ``minimize_individual_blocks``.

------------
The Shrinker
------------

The shrinking part of Hypothesis is organised into a single class called ``Shrinker``
that lives in engine.py.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: engine.py


Its job is to take an initial ``ConjectureData`` object and some predicate that
it satisfies, and to try to produce a simpler ``ConjectureData`` object that
also satisfies that predicate.

~~~~~~~~~~~~~~
Search Process
~~~~~~~~~~~~~~

The search process mostly happens in the ``shrink`` method. It is split into
two parts: ``greedy_shrink`` and ``escape_local_minimum``. The former is a
greedy algorithm, meaning that it will only ever call the predicate with values
that are strictly smaller than our current best. This mostly works very well,
but sometimes it gets stuck. So what we do is after we have run that we try
restarting the process from something like our final state but a bit fuzzed and
run the greedy shrink again. We keep doing this as long as it results in a
smaller value than our previous best.

The greedy shrinker is where almost all of the work happens. It is organised
into a large number of search passes, and is designed to run until all of those
passes fail to make any improvements.

~~~~~~~~~~~~~
Search Passes
~~~~~~~~~~~~~

Search passes are methods on the ``Shrinker`` class in engine.py. They are
designed to take the current shrink target and try a number of things that might
be sensible shrinks of it.

Typically the design of a search pass is that it should always try to run to
completion rather than exiting as soon as it's found something good, but that
it shouldn't retry things that are too like stuff it has already tried just
because something worked. So for example in the above loop, we try deleting
each interval (these roughly correspond to regions of the input that are
responsible for some particular value or small number of adjacent values).
When we succeed, we keep going and try deleting more intervals, but we don't
try to delete any intervals before the current index.

The reason for this is that retrying things from the beginning might work but
probably won't. Thus if we restarted every time we made a change we would end
up doing a lot of useless work. Additionally, they are *more* likely to work
after other shrink passes have run because frequently other changes are likely
to unlock changes in the current pass that were previously impossible. e.g.
when we reorder some examples we might make a big region deletable that
previously contained something critical to the relevant behaviour of the test
but is now just noise.

Because the shrinker runs in a big loop, if we've made progress the shrink pass
will always be run again (assuming we don't hit some limit that terminates the
shrink early, but by making the shrinker better we try to ensure that that
never happens).
This means that we will always get an opportunity to start again later if we
made progress, and if we didn't make progress we've tried everything anyway.


~~~~~~~~~~~~~~~~~~~~~~~
Expensive Shrink Passes
~~~~~~~~~~~~~~~~~~~~~~~

We have a bunch of search passes that are considered "expensive". Typically
this means "quadratic or worse complexity". When shrinking we initially don't
run these, and the first time that we get to the end of our main passes and
have failed to make the input any smaller, we then turn them on.

This allows the shrinker to switch from a good but slightly timid mode while its
input is large into a more aggressive DELETE ALL THE THINGS mode once that stops
working. By that point ideally we've made our input small enough that quadratic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "ideally we've" -> "we've usually"

complexity is acceptable.

We turn these on once and then they stay on. The reason for this is to avoid a
"flip-flopping" scenario where an expensive pass unlocks one trivial change that
the cheap passes can find and then they get stuck again and have to do an extra
useless run through the passes to prove that.

~~~~~~~~~~~~~~~~~~~~~~
Adaptive Shrink Passes
~~~~~~~~~~~~~~~~~~~~~~

A useful trick that some of the shrink passes use is to try a thing and if it
doesn't work take a look at what the test function did to guess *why* it didn't
work and try to repair that.

Two example such passes are ``zero_draws`` and the various passes that try to
minimize individual blocks lexicographically.

What happens in ``zero_draws`` is that we try replacing the region corresponding
to a draw with all zero bytes. If that doesn't work, we check if that was because
of changing the size of the example (e.g. doing that with a list will make the
list much shorter) and messing up the byte stream after that point. If this
was what happened then we try again with a sequence of zeroes that corresponds
to the size of the draw call in the version we tried that didn't work.

The logic for what we do with block minimization is in ``try_shrinking_blocks``.
When it tries shrinking a block and it doesn't work, it checks if the sized
changed. If it does then it tries deleting the number of bytes that were lost
immediately after the shrunk block to see if it helps.


--------------
Playing Around
--------------

I often find that it is informative to watch the shrink process in action using
Hypothesis's verbosity settings. This can give you an idea of what the format
of your data is, and how the shrink process transforms it.

In particular, it is often useful to run a test with the flag ``-s`` to tell it
not to hide output and the environment variable ``HYPOTHESIS_VERBOSITY_LEVEL=debug``.
This will give you a very detailed log of what the testing process is running,
along with information about what passes in the shrinker rare running and how
they transform it.

---------------
Getting Started
---------------

The best way of getting started on working on the engine is to work on the
shrinker. This is because it has the most well defined problems, the best
documented code among the engine, and it's generally fun to work on.

If you have not already done so, check out `Issue #1093 <https://github.com/HypothesisWorks/hypothesis-python/issues/1093>`_,
which collates a number of other issues about shrink quality that are good starting
points for people.

The best place to get started thus is to take a look at those linked issues and
jump in and try things! Find one that you think sounds fun. Note that some
of them suggest not doing these as your first foray into the shrinker, as some
are harder than others.

*Please* ask questions if you have any - either the main issue for general
purpose questions or specific issues for questions about a particular problem -
if you get stuck or if anything doesn't make sense. We're trying to make this
process easier for everyone to work on, so asking us questions is actively
helpful to us and we will be very grateful to you for doing so.