Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a guide for working on internals #1100

Merged
merged 17 commits into from Feb 15, 2018
Merged

Conversation

DRMacIver
Copy link
Member

As part of #1093 I promised to write us an internals guide. This is a hopefully decent start at that!

Copy link
Contributor

@alexwlchan alexwlchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, but looks good on a first skim.


One such example of a hack is the handling of floating point numbers. There are
a couple of lexicographic shrinks that are always valid but only really make
sense for our particular encoding of floats. We simply detect when we're working
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Drop “simply”.

i += 1


The more natural way to write this in Python would of course be:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Drop “of course”.


This way of writing the loop would be *entirely wrong*.

Every time `incorporate_new_buffer` succeeds, it changes the shape of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: needs double backticks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: actually failures to use RST correctly are things that need fixing rather than merely nits. ;-)

normal order on strings - find the first byte at which they differ and use that
to decide) smallest.

Ideally we could think of the shrinker is a generic function that takes a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 'as a generic function'

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic!

I've added several comments below on specific bits I think could flow better, but overall this is a brilliant introduction to Hypothesis internals and I love it ✨:tada:


The core engine of Hypothesis is called Conjecture.

The "fundamental idea" of Conjecture is that you can represent an arbitrary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reverse these, so we use a PRNG and view that as bytes:

The "fundamental idea" of Conjecture is that you can represent an arbitrary randomized test case as a sequence of outputs from some pseudo-random number generator (PRNG), which in turn can be viewed as a string of bytes representing the underlying entropy. Whenever you want to do something "random", you read the next bytes and do what they tell you to do.

Regardless, delete the word "basically".

Ideally we could think of the shrinker as a generic function that takes a
string satisfying some predicate and returns the shortlex minimal string that
also satisfies it.
This is wrong on several levels: The first is that we only succeed in approximating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The implementation departs from this ideal in two ways:" ...

(IMO it's not wrong per se, just, well, idealised)

reduce our input. These vary from principled general transformations to shameless
hacks that special case something we need to work well. We try to aim for mostly
the former, but the nice thing about this model is that the underlying representation
is fully general and we are free to try whatever we want and it will never result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence rambles a bit. How about:

...something we need to work well. The nice thing about this model is that the underlying representation of test-as-bytestring is fully general, so we can use any technique at all to minimise the string without inherent correctness problems. Principled transformations are usually more general, but special-case hacks are only a problem to the degree that they result in messy code and fragile heuristics, so if we need one for something it's not a big deal.

(where the "user" might still be something that is entirely our code) and we
want to pass a whole bunch of different examples to it in order to achieve some
result. Currently this includes each of the main engine, the Shrinker (in
engine.py) and the minimizer, but there are likely to be more in future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: monospace engine.py

self.shrink_target.buffer[:u] + self.shrink_target.buffer[v:]
)

This way of writing the loop would be *entirely wrong*.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, in this case the idiomatic loop is not equivalent.

current shrink target. This consequently changes the shape of intervals, both
its particular values and its current length - on each loop iteration the loop
might stop either because ``i`` increases or because ``len(self.intervals)``
decreases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a note that starting a fresh loop each time self.intervals changes would have quadratic rather than linear complexity.

------------

The shrinking part of Hypothesis is organised into a single class called ``Shrinker``
that lives in engine.py.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: engine.py


This allows the shrinker to switch from a good but slightly timid mode while its
input is large into a more aggressive DELETE ALL THE THINGS mode once that stops
working. By that point ideally we've made our input small enough that quadratic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "ideally we've" -> "we've usually"

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(double-posted for some reason; approved on the substance of the guide)

we can shrink arbitrary test cases based on it. We try to produce a string that
is *shortlex minimal*. What this means is that it has the shortest possible
length and among those strings of minimal length is lexicographically (i.e. the
normal order on strings - find the first byte at which they differ and use that
Copy link

@ghost ghost Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typographical nitpicks, em-dash (—) for punctuation dash, also on lines 188 & 338, comma after i.e.

Copy link
Member Author

@DRMacIver DRMacIver Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to annoying technical constraints (because everything is awful despite it being 2018 my console is not unicode-sensible, I'm unwilling to write this as — and Github doesn't render RST em-dashes correctly) they're not going to be em-dashes. 😢

Elaborate on comma after i.e.? Is that a thing I'm supposed to be doing?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we all chip in and buy you a proper terminal? 😄 Re i.e., I had that beaten into me from childhood, but a quick snout around reveals it is mandatory for US English (if the Chicago manual of style is your Bible, it is mine), but optional in BE (Fowler); so I have been deceived by yankee cultural imperialism. I'll continue to use though, since the comma indicates the natural spoken pause that you would use when replacing it with "that is".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we all chip in and buy you a proper terminal?

Unfortunately due to a complex interlocking series of least worst solutions and bad life choices, getting me a better terminal would require getting me a better computer (I am currently working inside the Windows Ubuntu subsystem, and all of the terminal options for Windows are awful).

that means we can do things like replacing an integer with a smaller one.

-------
Testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be moved to testing-hypothesis.rst, and just leave a reference here.

We should also explain how to install the test dependencies.

@DRMacIver
Copy link
Member Author

Zac-HD approved these changes 8 hours ago

Please don't merge this yet. I am going to address all the review comments, this week is just a bit all-at-oncey and I've ended up pushing this back behind some deadlines. Will get to it Friday or this weekend.

@Zac-HD
Copy link
Member

Zac-HD commented Feb 7, 2018

No problem, I generally avoid merging unless I've seen an explicit "I'm done" message 😄

pip install -e .

# Test specific dependencies.
pip install pytest-xdist flaky
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this guide looks really helpful! Thanks! Just today I was struggling with how to run a single test.

Without these dependencies, I got:

python -m pytest tests/cover/test_conjecture_engine.py
usage: pytest.py [options] [file_or_dir] [file_or_dir] [...]
pytest.py: error: unrecognized arguments: -n
  inifile: /Users/davidchudzicki/hypothesis-python/tox.ini
  rootdir: /Users/davidchudzicki/hypothesis-python

I think I also would have found this if it existed yet, so that's good too.

python -m pytest tests/cover/test_conjecture_engine.py

You will need to have Hypothesis installed locally to run these. I recommend a
virtualenv where you have run ``pip install -e .``, which installs all the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also install the test dependencies? (that is, pytest-xdist and flaky)

@DRMacIver
Copy link
Member Author

Right, this has changed enough that I'd like a quick rereview. Main things that are different:

  • A lot of stuff got moved to testing-hypothesis.rst and expanded upon
  • I've added the discussion of deferring errors to api-style.rst

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Zac-HD
Copy link
Member

Zac-HD commented Feb 15, 2018

@DRMacIver, are you happy for this to be merged now? I am 😁

@DRMacIver DRMacIver merged commit 5ca9ee7 into master Feb 15, 2018
@DRMacIver DRMacIver deleted the DRMacIver/internals-guide branch February 15, 2018 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants