Getting more people to work on the shrinker #1093

DRMacIver · 2018-01-30T10:17:53Z

So having just finished my epic marathon of paper about Hypothesis's test case reduction/shrinking made a couple of things clear to me. Specifically:

Man test-case reduction in Hypothesis is so nice compared to literally everywhere else. The underlying model is clean and well-posed, and writing shrink passes is easy to do and fun.
Only I get to have that fun. This is sad. I should share the fun.

(I am aware that my notions of fun are peculiar and that test-case reduction is my own specific obsession, but I think other people could easily come to share it given the first part).

THEREFORE, this is an outreach ticket to try and get more people to work on the shrinker.

I consider the following to be the success criteria for this ticket (but will allow for some goal post movement):

There is a reasonably comprehensive "How to Work on the Shrinker" guide document
At least three pull requests from at least two people who are not me that improve shrink quality or performance in some meaningful way (refactoring PRs are great too, but I won't count them for this)
At least one issue filed by someone who is not me that demonstrates a failure of Hypothesis at shrinking something to a global optimum
And they should have fun doing it, dammit.

In aid of the above, I will be opening a large-ish number of shrink quality tickets and referencing them back here.

Note that the fun part is important. I wish to re-emphasise that at present there is literally no good reason to improve the Hypothesis shrinker except enjoyment and weird aesthetic obsessions: Due to the combination of better underlying model and a truly disproportionate amount of invested effort, it is currently so good that everyone else's shrinking looks comedically bad in comparison. This is a thing to work on because you think it would be fun much more than it is a thing to work on because you think it will be useful.

In aid of that, if you do want to work on this, please report back on anything that confuses or annoys you in the course of doing so. If you get stuck, please ask for help. These will be valuable contributions in their own right!

PS. If you end up wanting to work on this and are not on the Hypothesis core team, ask me for a copy of the paper about this, because it will clarify the underlying model. At some point I will be able to share it publicly but that point is not now.

dchudz · 2018-02-17T20:55:36Z

Since you encouraged questions... I'm having a little trouble following the Playing Around section of the internals guide to see what's going on with the shrinker.

If I delete the tox.ini file, all is as expected - so I guess it has something to do with that (and I can still play around with the shrinker that way in the meantime).

More details:

For test files under the hypothesis directory tree, the -s flag for pytest doesn't have the expected result (but works fine for test files outside of that directory, even when run from the hypothesis directory with my hypothesis virtualenv activated.

Here's what I'm running (in the root of the hypothesis repo):

$echo 'def test_no_capture(): print("hi there!")' > ../test_no_capture.py
$pytest ../test_no_capture.py -s
$echo 'def test_no_capture(): print("hi there!")' > test_no_capture.py
$pytest test_no_capture.py -s

Here it is with its output:

$echo 'def test_no_capture(): print("hi there!")' > ../test_no_capture.py
(master) /Users/davidchudzicki/hypothesis-python
$pytest ../test_no_capture.py -s
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.6.1, pytest-3.4.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/davidchudzicki, inifile:
plugins: xdist-1.22.0, profiling-1.2.11, forked-0.2, flaky-3.4.0, hypothesis-3.44.26
collected 1 item

../test_no_capture.py hi there!
.
===Flaky Test Report===


===End Flaky Test Report===

========================================================================================= 1 passed in 0.00 seconds =========================================================================================
(master) /Users/davidchudzicki/hypothesis-python
$echo 'def test_no_capture(): print("hi there!")' > test_no_capture.py
(master) /Users/davidchudzicki/hypothesis-python
$pytest test_no_capture.py -s
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.6.1, pytest-3.4.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/davidchudzicki/hypothesis-python, inifile: tox.ini
plugins: xdist-1.22.0, profiling-1.2.11, forked-0.2, flaky-3.4.0, hypothesis-3.44.26
gw0 [1] / gw1 [1]
scheduling tests via LoadScheduling
.
===Flaky Test Report===


===End Flaky Test Report===
======================================================================================== slowest 20 test durations =========================================================================================
0.00s setup    test_no_capture.py::test_no_capture
0.00s call     test_no_capture.py::test_no_capture
0.00s teardown test_no_capture.py::test_no_capture

The example above is meant to be minimal-ish, but I first experienced this running $HYPOTHESIS_VERBOSITY_LEVEL=debug pytest tests/quality/test_shrink_quality.py -k test_minimize_multiple_elements_in_silly_large_int_range -s.

pytest --version
This is pytest version 3.4.0, imported from /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/pytest.py
setuptools registered plugins:
  pytest-xdist-1.22.0 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/xdist/plugin.py
  pytest-xdist-1.22.0 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/xdist/looponfail.py
  pytest-profiling-1.2.11 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/pytest_profiling.py
  pytest-forked-0.2 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/pytest_forked/__init__.py
  flaky-3.4.0 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/flaky/flaky_pytest_plugin.py
  hypothesis-3.44.26 at /Users/davidchudzicki/hypothesis-python/src/hypothesis/extra/pytestplugin.py

DRMacIver · 2018-02-18T10:54:39Z

Since you encouraged questions... I'm having a little trouble following the Playing Around section of the internals guide to see what's going on with the shrinker.

Hmm. Does it work inside the directory tree if you run with -n 0?

I've never noticed this as a problem, and that's probably because I always run locally with -n 0! I think there might be some slightly weird going on in how we handle reporting that doesn't work correctly under xdist (e.g. we have this issue which is smaller in scope but not unrelated)

dchudz · 2018-02-18T14:14:52Z

Works great with -n 0, so yeah, seems related to #700.

I've also confirmed that even outside the hypothesis directory tree, -n 1 doesn't show me the printed output, so it looks like "no capture" just doesn't work with xdist (which mostly seems like other people's experience too, from looking around the internet).

I've opened #1122 to suggest -n0 in the internals guide.

Zac-HD · 2018-02-20T01:36:03Z

Removed docs label, thanks to the new guides from #1100 😄

flyingmutant · 2018-03-02T12:50:23Z

@DRMacIver is the "paper about Hypothesis's test case reduction/shrinking" available to the public? Very interested in reading that!

Zac-HD · 2018-10-02T05:36:55Z

Closing this as we now have decent development guides, far better internal commenting, and a growing pool of people who have contributed to the Conjecture engine. Child issues tagged shrink-quality remain open and active though, and further comments about coordinating them are still welcome here.

Wilfred · 2018-10-24T22:14:24Z

@Zac-HD @DRMacIver I've been looking at porting hypothesis to emacs lisp, so I've been trying to understand the internals of Hypothesis (rather than just being a contented user).

The docs that exist are excellent and clear, but I've still found a few things unclear.

The internals docs page is helpful, so it's a shame the page isn't on the Hypothesis ReadTheDocs site.
I'm unable to find any documentation on ConjectureData. It has no docstrings and does quite a lot. I've looked at the hypothesis-java implementation of TestData, which is a little simpler, but still left me with questions. (I also don't want to base my implementation on hypothesis-java, as I don't want my project to be AGPL.)

I hope this feedback is useful. I think I understand most of the API, but if you have a minute, it's not clear to me:

Why freeze/unfreeze? What guarantee does this provide?
When do Blocks and Examples differ? They both seem to be created inside ConjectureData.__write with the same indexes.
How have you chosen the magic values? For example, the distributing/weighting here:

class WideRangeIntStrategy(IntStrategy):

    distribution = d.Sampler([
        4.0, 8.0, 1.0, 1.0, 0.5
    ])

    sizes = [8, 16, 32, 64, 128]

or these values:

NASTY_FLOATS = sorted([
    0.0, 0.5, 1.1, 1.5, 1.9, 1.0 / 3, 10e6, 10e-6, 1.175494351e-38,
    2.2250738585072014e-308,
    1.7976931348623157e+308, 3.402823466e+38, 9007199254740992, 1 - 10e-6,
    2 + 10e-6, 1.192092896e-07, 2.2204460492503131e-016,

] + [float('inf'), float('nan')] * 5, key=flt.float_to_lex)

Zac-HD · 2018-10-25T00:13:18Z

Just some quick comments from me; David might (should?) know more 😄

To date we've tried to keep user and developer docs separate, to help reinforce the distinction between public interface and internals and that you don't need to understand the latter to use the former.

ConjectureData is big and complicated (and under-commented). For Hypothesis ports though, I'd suggest starting with the Rust implementation of Conjecture - currently under hypothesis-ruby. This is currently much simpler and weaker than the Python backend, but eventually we intend to share it between all language frontends. You could even use it directly if you don't mind having a dependency in alpha with unstable interfaces, or just plan to port over when it's ready like we will for Python.

Blocks and examples are discussed in #1601 - hopefully that helps.

My understanding is that the magic values are chosen quite arbitrarily based on the kinds of values that tend to expose bugs, or to give generally desirable size/complexity properties of generated data. I don't know exactly how they were chosen though, or whether we've evaluated alternatives.

Zalathar · 2018-10-25T00:24:06Z

Calling freeze marks the transition from “this is an active trial that we're adding data to” to ”this is an immutable snapshot of a past trial that the generator/shrinker can use as reference”. It performs housekeeping like closing all open example regions and recording statistics, and replaces various lists/sets/dicts with immutable versions to reduce the chances of a bug accidentally modifying a completed trial's data.

Freezing is a one-way process. There is no unfreeze.

The main difference between Block and Example is that examples are hierarchical and nested, while blocks are flat. Every block is also covered by an example with DRAW_BYTES_LABEL, but there are other kinds of example regions that cover many blocks and many sub-examples.

This makes them useful for different kinds of shrinker passes. Blocks get used by passes that want to treat a region of bytes as a number and reduce that number, whereas examples get used by passes that want to attempt “higher-level” transformations like deleting complicated list elements, or zeroing large regions of data in one go.

The Example and Block classes gained some basic documentation a little while ago, so hopefully that should clarify things a bit. (Though you might be one of the first people to try reading them, so let us know if something seems unclear.)

Zalathar · 2018-10-25T00:36:13Z

For Hypothesis ports though, I'd suggest starting with the Rust implementation of Conjecture - currently under hypothesis-ruby.

Having tried both approaches recently, I actually think hypothesis-python is a better reference for porting at the moment.

The learning curve is a bit steep, but Python Conjecture isn't that complicated once you eventually get your bearings. And it's nice to be able to refer to the most mature and battle-tested implementation.

nchammas · 2024-02-13T18:04:49Z

@Wilfred

How have you chosen the magic values?

NASTY_FLOATS = sorted([
    0.0, 0.5, 1.1, 1.5, 1.9, 1.0 / 3, 10e6, 10e-6, 1.175494351e-38,
    2.2250738585072014e-308,
    1.7976931348623157e+308, 3.402823466e+38, 9007199254740992, 1 - 10e-6,
    2 + 10e-6, 1.192092896e-07, 2.2204460492503131e-016,

] + [float('inf'), float('nan')] * 5, key=flt.float_to_lex)

Some of those floats exhibit classic gotchas inherent to floating point arithmetic.

For example:

>>> 9007199254740992.0
9007199254740992.0
>>> 9007199254740992.0 + 1.0
9007199254740992.0
>>> 9007199254740992.0 + 1.0 + 1.0
9007199254740992.0
>>> 9007199254740992.0 + 1.0 + 1.0 + 1.0 + 1.0 + 1.0
9007199254740992.0
>>> 9007199254740992.0 + 2.0
9007199254740994.0

Or of course:

>>> from math import nan
>>> nan == nan
False

Dunno if it's worth documenting this in the code, but those are some of the reasons they're "nasty".

DRMacIver added meta for wider topics than the software itself docs documentation could *always* be better test-case-reduction about efficiently finding smaller failing examples labels Jan 30, 2018

Zac-HD added the help wanted label Jan 30, 2018

This was referenced Jan 31, 2018

Bulk replacement of uninteresting values with zero #1098

Closed

Dealing with linked dependencies #1099

Closed

Add a guide for working on internals #1100

Merged

Handle zig-zagging when doing more global lexicographic minimization #1102

Closed

dchudz mentioned this issue Feb 18, 2018

suggest disabling xdist in internals guide #1122

Closed

dchudz mentioned this issue Feb 19, 2018

don't default to xdist -n2 for pytest #1123

Merged

Zac-HD removed the docs documentation could *always* be better label Feb 20, 2018

Zac-HD added good first issue and removed help wanted labels Feb 20, 2018

DRMacIver mentioned this issue Mar 26, 2018

The shrinker can get stuck on indexes into lists #1187

Closed

Zac-HD removed the good first issue label Oct 2, 2018

Zac-HD closed this as completed Oct 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting more people to work on the shrinker #1093

Getting more people to work on the shrinker #1093

DRMacIver commented Jan 30, 2018 •

edited by Zac-HD

dchudz commented Feb 17, 2018 •

edited

DRMacIver commented Feb 18, 2018

dchudz commented Feb 18, 2018 •

edited

Zac-HD commented Feb 20, 2018

flyingmutant commented Mar 2, 2018

Zac-HD commented Oct 2, 2018

Wilfred commented Oct 24, 2018

Zac-HD commented Oct 25, 2018

Zalathar commented Oct 25, 2018 •

edited

Zalathar commented Oct 25, 2018

nchammas commented Feb 13, 2024

Getting more people to work on the shrinker #1093

Getting more people to work on the shrinker #1093

Comments

DRMacIver commented Jan 30, 2018 • edited by Zac-HD

dchudz commented Feb 17, 2018 • edited

DRMacIver commented Feb 18, 2018

dchudz commented Feb 18, 2018 • edited

Zac-HD commented Feb 20, 2018

flyingmutant commented Mar 2, 2018

Zac-HD commented Oct 2, 2018

Wilfred commented Oct 24, 2018

Zac-HD commented Oct 25, 2018

Zalathar commented Oct 25, 2018 • edited

Zalathar commented Oct 25, 2018

nchammas commented Feb 13, 2024

DRMacIver commented Jan 30, 2018 •

edited by Zac-HD

dchudz commented Feb 17, 2018 •

edited

dchudz commented Feb 18, 2018 •

edited

Zalathar commented Oct 25, 2018 •

edited