Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting more people to work on the shrinker #1093

Closed
4 tasks done
DRMacIver opened this issue Jan 30, 2018 · 11 comments
Closed
4 tasks done

Getting more people to work on the shrinker #1093

DRMacIver opened this issue Jan 30, 2018 · 11 comments
Labels
meta for wider topics than the software itself test-case-reduction about efficiently finding smaller failing examples

Comments

@DRMacIver
Copy link
Member

DRMacIver commented Jan 30, 2018

So having just finished my epic marathon of paper about Hypothesis's test case reduction/shrinking made a couple of things clear to me. Specifically:

  • Man test-case reduction in Hypothesis is so nice compared to literally everywhere else. The underlying model is clean and well-posed, and writing shrink passes is easy to do and fun.
  • Only I get to have that fun. This is sad. I should share the fun.

(I am aware that my notions of fun are peculiar and that test-case reduction is my own specific obsession, but I think other people could easily come to share it given the first part).

THEREFORE, this is an outreach ticket to try and get more people to work on the shrinker.

I consider the following to be the success criteria for this ticket (but will allow for some goal post movement):

  • There is a reasonably comprehensive "How to Work on the Shrinker" guide document
  • At least three pull requests from at least two people who are not me that improve shrink quality or performance in some meaningful way (refactoring PRs are great too, but I won't count them for this)
  • At least one issue filed by someone who is not me that demonstrates a failure of Hypothesis at shrinking something to a global optimum
  • And they should have fun doing it, dammit.

In aid of the above, I will be opening a large-ish number of shrink quality tickets and referencing them back here.

Note that the fun part is important. I wish to re-emphasise that at present there is literally no good reason to improve the Hypothesis shrinker except enjoyment and weird aesthetic obsessions: Due to the combination of better underlying model and a truly disproportionate amount of invested effort, it is currently so good that everyone else's shrinking looks comedically bad in comparison. This is a thing to work on because you think it would be fun much more than it is a thing to work on because you think it will be useful.

In aid of that, if you do want to work on this, please report back on anything that confuses or annoys you in the course of doing so. If you get stuck, please ask for help. These will be valuable contributions in their own right!

PS. If you end up wanting to work on this and are not on the Hypothesis core team, ask me for a copy of the paper about this, because it will clarify the underlying model. At some point I will be able to share it publicly but that point is not now.

@dchudz
Copy link
Member

dchudz commented Feb 17, 2018

Since you encouraged questions... I'm having a little trouble following the Playing Around section of the internals guide to see what's going on with the shrinker.

If I delete the tox.ini file, all is as expected - so I guess it has something to do with that (and I can still play around with the shrinker that way in the meantime).

More details:

For test files under the hypothesis directory tree, the -s flag for pytest doesn't have the expected result (but works fine for test files outside of that directory, even when run from the hypothesis directory with my hypothesis virtualenv activated.

Here's what I'm running (in the root of the hypothesis repo):

$echo 'def test_no_capture(): print("hi there!")' > ../test_no_capture.py
$pytest ../test_no_capture.py -s
$echo 'def test_no_capture(): print("hi there!")' > test_no_capture.py
$pytest test_no_capture.py -s

Here it is with its output:

$echo 'def test_no_capture(): print("hi there!")' > ../test_no_capture.py
(master) /Users/davidchudzicki/hypothesis-python
$pytest ../test_no_capture.py -s
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.6.1, pytest-3.4.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/davidchudzicki, inifile:
plugins: xdist-1.22.0, profiling-1.2.11, forked-0.2, flaky-3.4.0, hypothesis-3.44.26
collected 1 item

../test_no_capture.py hi there!
.
===Flaky Test Report===


===End Flaky Test Report===

========================================================================================= 1 passed in 0.00 seconds =========================================================================================
(master) /Users/davidchudzicki/hypothesis-python
$echo 'def test_no_capture(): print("hi there!")' > test_no_capture.py
(master) /Users/davidchudzicki/hypothesis-python
$pytest test_no_capture.py -s
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.6.1, pytest-3.4.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/davidchudzicki/hypothesis-python, inifile: tox.ini
plugins: xdist-1.22.0, profiling-1.2.11, forked-0.2, flaky-3.4.0, hypothesis-3.44.26
gw0 [1] / gw1 [1]
scheduling tests via LoadScheduling
.
===Flaky Test Report===


===End Flaky Test Report===
======================================================================================== slowest 20 test durations =========================================================================================
0.00s setup    test_no_capture.py::test_no_capture
0.00s call     test_no_capture.py::test_no_capture
0.00s teardown test_no_capture.py::test_no_capture

The example above is meant to be minimal-ish, but I first experienced this running $HYPOTHESIS_VERBOSITY_LEVEL=debug pytest tests/quality/test_shrink_quality.py -k test_minimize_multiple_elements_in_silly_large_int_range -s.

pytest --version
This is pytest version 3.4.0, imported from /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/pytest.py
setuptools registered plugins:
  pytest-xdist-1.22.0 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/xdist/plugin.py
  pytest-xdist-1.22.0 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/xdist/looponfail.py
  pytest-profiling-1.2.11 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/pytest_profiling.py
  pytest-forked-0.2 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/pytest_forked/__init__.py
  flaky-3.4.0 at /Users/davidchudzicki/.virtualenvs/hypothesis/lib/python3.6/site-packages/flaky/flaky_pytest_plugin.py
  hypothesis-3.44.26 at /Users/davidchudzicki/hypothesis-python/src/hypothesis/extra/pytestplugin.py

@DRMacIver
Copy link
Member Author

Since you encouraged questions... I'm having a little trouble following the Playing Around section of the internals guide to see what's going on with the shrinker.

Hmm. Does it work inside the directory tree if you run with -n 0?

I've never noticed this as a problem, and that's probably because I always run locally with -n 0! I think there might be some slightly weird going on in how we handle reporting that doesn't work correctly under xdist (e.g. we have this issue which is smaller in scope but not unrelated)

@dchudz
Copy link
Member

dchudz commented Feb 18, 2018

Works great with -n 0, so yeah, seems related to #700.

I've also confirmed that even outside the hypothesis directory tree, -n 1 doesn't show me the printed output, so it looks like "no capture" just doesn't work with xdist (which mostly seems like other people's experience too, from looking around the internet).

I've opened #1122 to suggest -n0 in the internals guide.

@Zac-HD Zac-HD removed the docs documentation could *always* be better label Feb 20, 2018
@Zac-HD
Copy link
Member

Zac-HD commented Feb 20, 2018

Removed docs label, thanks to the new guides from #1100 😄

@flyingmutant
Copy link
Contributor

@DRMacIver is the "paper about Hypothesis's test case reduction/shrinking" available to the public? Very interested in reading that!

@Zac-HD
Copy link
Member

Zac-HD commented Oct 2, 2018

Closing this as we now have decent development guides, far better internal commenting, and a growing pool of people who have contributed to the Conjecture engine. Child issues tagged shrink-quality remain open and active though, and further comments about coordinating them are still welcome here.

@Zac-HD Zac-HD closed this as completed Oct 2, 2018
@Wilfred
Copy link
Contributor

Wilfred commented Oct 24, 2018

@Zac-HD @DRMacIver I've been looking at porting hypothesis to emacs lisp, so I've been trying to understand the internals of Hypothesis (rather than just being a contented user).

The docs that exist are excellent and clear, but I've still found a few things unclear.

  • The internals docs page is helpful, so it's a shame the page isn't on the Hypothesis ReadTheDocs site.
  • I'm unable to find any documentation on ConjectureData. It has no docstrings and does quite a lot. I've looked at the hypothesis-java implementation of TestData, which is a little simpler, but still left me with questions. (I also don't want to base my implementation on hypothesis-java, as I don't want my project to be AGPL.)

I hope this feedback is useful. I think I understand most of the API, but if you have a minute, it's not clear to me:

  • Why freeze/unfreeze? What guarantee does this provide?
  • When do Blocks and Examples differ? They both seem to be created inside ConjectureData.__write with the same indexes.
  • How have you chosen the magic values? For example, the distributing/weighting here:
class WideRangeIntStrategy(IntStrategy):

    distribution = d.Sampler([
        4.0, 8.0, 1.0, 1.0, 0.5
    ])

    sizes = [8, 16, 32, 64, 128]

or these values:

NASTY_FLOATS = sorted([
    0.0, 0.5, 1.1, 1.5, 1.9, 1.0 / 3, 10e6, 10e-6, 1.175494351e-38,
    2.2250738585072014e-308,
    1.7976931348623157e+308, 3.402823466e+38, 9007199254740992, 1 - 10e-6,
    2 + 10e-6, 1.192092896e-07, 2.2204460492503131e-016,

] + [float('inf'), float('nan')] * 5, key=flt.float_to_lex)

@Zac-HD
Copy link
Member

Zac-HD commented Oct 25, 2018

Just some quick comments from me; David might (should?) know more 😄

To date we've tried to keep user and developer docs separate, to help reinforce the distinction between public interface and internals and that you don't need to understand the latter to use the former.

ConjectureData is big and complicated (and under-commented). For Hypothesis ports though, I'd suggest starting with the Rust implementation of Conjecture - currently under hypothesis-ruby. This is currently much simpler and weaker than the Python backend, but eventually we intend to share it between all language frontends. You could even use it directly if you don't mind having a dependency in alpha with unstable interfaces, or just plan to port over when it's ready like we will for Python.

Blocks and examples are discussed in #1601 - hopefully that helps.

My understanding is that the magic values are chosen quite arbitrarily based on the kinds of values that tend to expose bugs, or to give generally desirable size/complexity properties of generated data. I don't know exactly how they were chosen though, or whether we've evaluated alternatives.

@Zalathar
Copy link
Contributor

Zalathar commented Oct 25, 2018

Calling freeze marks the transition from “this is an active trial that we're adding data to” to ”this is an immutable snapshot of a past trial that the generator/shrinker can use as reference”. It performs housekeeping like closing all open example regions and recording statistics, and replaces various lists/sets/dicts with immutable versions to reduce the chances of a bug accidentally modifying a completed trial's data.

Freezing is a one-way process. There is no unfreeze.

The main difference between Block and Example is that examples are hierarchical and nested, while blocks are flat. Every block is also covered by an example with DRAW_BYTES_LABEL, but there are other kinds of example regions that cover many blocks and many sub-examples.

This makes them useful for different kinds of shrinker passes. Blocks get used by passes that want to treat a region of bytes as a number and reduce that number, whereas examples get used by passes that want to attempt “higher-level” transformations like deleting complicated list elements, or zeroing large regions of data in one go.

The Example and Block classes gained some basic documentation a little while ago, so hopefully that should clarify things a bit. (Though you might be one of the first people to try reading them, so let us know if something seems unclear.)

@Zalathar
Copy link
Contributor

For Hypothesis ports though, I'd suggest starting with the Rust implementation of Conjecture - currently under hypothesis-ruby.

Having tried both approaches recently, I actually think hypothesis-python is a better reference for porting at the moment.

The learning curve is a bit steep, but Python Conjecture isn't that complicated once you eventually get your bearings. And it's nice to be able to refer to the most mature and battle-tested implementation.

@nchammas
Copy link
Contributor

@Wilfred

How have you chosen the magic values?

NASTY_FLOATS = sorted([
    0.0, 0.5, 1.1, 1.5, 1.9, 1.0 / 3, 10e6, 10e-6, 1.175494351e-38,
    2.2250738585072014e-308,
    1.7976931348623157e+308, 3.402823466e+38, 9007199254740992, 1 - 10e-6,
    2 + 10e-6, 1.192092896e-07, 2.2204460492503131e-016,

] + [float('inf'), float('nan')] * 5, key=flt.float_to_lex)

Some of those floats exhibit classic gotchas inherent to floating point arithmetic.

For example:

>>> 9007199254740992.0
9007199254740992.0
>>> 9007199254740992.0 + 1.0
9007199254740992.0
>>> 9007199254740992.0 + 1.0 + 1.0
9007199254740992.0
>>> 9007199254740992.0 + 1.0 + 1.0 + 1.0 + 1.0 + 1.0
9007199254740992.0
>>> 9007199254740992.0 + 2.0
9007199254740994.0

Or of course:

>>> from math import nan
>>> nan == nan
False

Dunno if it's worth documenting this in the code, but those are some of the reasons they're "nasty".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta for wider topics than the software itself test-case-reduction about efficiently finding smaller failing examples
Projects
None yet
Development

No branches or pull requests

7 participants