New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a guide for working on internals #1100
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, but looks good on a first skim.
guides/internals.rst
Outdated
|
||
One such example of a hack is the handling of floating point numbers. There are | ||
a couple of lexicographic shrinks that are always valid but only really make | ||
sense for our particular encoding of floats. We simply detect when we're working |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Drop “simply”.
guides/internals.rst
Outdated
i += 1 | ||
|
||
|
||
The more natural way to write this in Python would of course be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Drop “of course”.
guides/internals.rst
Outdated
|
||
This way of writing the loop would be *entirely wrong*. | ||
|
||
Every time `incorporate_new_buffer` succeeds, it changes the shape of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: needs double backticks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: actually failures to use RST correctly are things that need fixing rather than merely nits. ;-)
guides/internals.rst
Outdated
normal order on strings - find the first byte at which they differ and use that | ||
to decide) smallest. | ||
|
||
Ideally we could think of the shrinker is a generic function that takes a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo 'as a generic function'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic!
I've added several comments below on specific bits I think could flow better, but overall this is a brilliant introduction to Hypothesis internals and I love it ✨:tada:
|
||
The core engine of Hypothesis is called Conjecture. | ||
|
||
The "fundamental idea" of Conjecture is that you can represent an arbitrary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would reverse these, so we use a PRNG and view that as bytes:
The "fundamental idea" of Conjecture is that you can represent an arbitrary randomized test case as a sequence of outputs from some pseudo-random number generator (PRNG), which in turn can be viewed as a string of bytes representing the underlying entropy. Whenever you want to do something "random", you read the next bytes and do what they tell you to do.
Regardless, delete the word "basically".
guides/internals.rst
Outdated
Ideally we could think of the shrinker as a generic function that takes a | ||
string satisfying some predicate and returns the shortlex minimal string that | ||
also satisfies it. | ||
This is wrong on several levels: The first is that we only succeed in approximating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The implementation departs from this ideal in two ways:" ...
(IMO it's not wrong per se, just, well, idealised)
guides/internals.rst
Outdated
reduce our input. These vary from principled general transformations to shameless | ||
hacks that special case something we need to work well. We try to aim for mostly | ||
the former, but the nice thing about this model is that the underlying representation | ||
is fully general and we are free to try whatever we want and it will never result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sentence rambles a bit. How about:
...something we need to work well. The nice thing about this model is that the underlying representation of test-as-bytestring is fully general, so we can use any technique at all to minimise the string without inherent correctness problems. Principled transformations are usually more general, but special-case hacks are only a problem to the degree that they result in messy code and fragile heuristics, so if we need one for something it's not a big deal.
guides/internals.rst
Outdated
(where the "user" might still be something that is entirely our code) and we | ||
want to pass a whole bunch of different examples to it in order to achieve some | ||
result. Currently this includes each of the main engine, the Shrinker (in | ||
engine.py) and the minimizer, but there are likely to be more in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: monospace engine.py
guides/internals.rst
Outdated
self.shrink_target.buffer[:u] + self.shrink_target.buffer[v:] | ||
) | ||
|
||
This way of writing the loop would be *entirely wrong*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, in this case the idiomatic loop is not equivalent.
current shrink target. This consequently changes the shape of intervals, both | ||
its particular values and its current length - on each loop iteration the loop | ||
might stop either because ``i`` increases or because ``len(self.intervals)`` | ||
decreases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a note that starting a fresh loop each time self.intervals
changes would have quadratic rather than linear complexity.
guides/internals.rst
Outdated
------------ | ||
|
||
The shrinking part of Hypothesis is organised into a single class called ``Shrinker`` | ||
that lives in engine.py. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: engine.py
guides/internals.rst
Outdated
|
||
This allows the shrinker to switch from a good but slightly timid mode while its | ||
input is large into a more aggressive DELETE ALL THE THINGS mode once that stops | ||
working. By that point ideally we've made our input small enough that quadratic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "ideally we've" -> "we've usually"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(double-posted for some reason; approved on the substance of the guide)
we can shrink arbitrary test cases based on it. We try to produce a string that | ||
is *shortlex minimal*. What this means is that it has the shortest possible | ||
length and among those strings of minimal length is lexicographically (i.e. the | ||
normal order on strings - find the first byte at which they differ and use that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typographical nitpicks, em-dash (—) for punctuation dash, also on lines 188 & 338, comma after i.e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to annoying technical constraints (because everything is awful despite it being 2018 my console is not unicode-sensible, I'm unwilling to write this as —
and Github doesn't render RST em-dashes correctly) they're not going to be em-dashes. 😢
Elaborate on comma after i.e.? Is that a thing I'm supposed to be doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we all chip in and buy you a proper terminal? 😄 Re i.e., I had that beaten into me from childhood, but a quick snout around reveals it is mandatory for US English (if the Chicago manual of style is your Bible, it is mine), but optional in BE (Fowler); so I have been deceived by yankee cultural imperialism. I'll continue to use though, since the comma indicates the natural spoken pause that you would use when replacing it with "that is".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we all chip in and buy you a proper terminal?
Unfortunately due to a complex interlocking series of least worst solutions and bad life choices, getting me a better terminal would require getting me a better computer (I am currently working inside the Windows Ubuntu subsystem, and all of the terminal options for Windows are awful).
that means we can do things like replacing an integer with a smaller one. | ||
|
||
------- | ||
Testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be moved to testing-hypothesis.rst
, and just leave a reference here.
We should also explain how to install the test dependencies.
Please don't merge this yet. I am going to address all the review comments, this week is just a bit all-at-oncey and I've ended up pushing this back behind some deadlines. Will get to it Friday or this weekend. |
No problem, I generally avoid merging unless I've seen an explicit "I'm done" message 😄 |
pip install -e . | ||
|
||
# Test specific dependencies. | ||
pip install pytest-xdist flaky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this guide looks really helpful! Thanks! Just today I was struggling with how to run a single test.
Without these dependencies, I got:
python -m pytest tests/cover/test_conjecture_engine.py
usage: pytest.py [options] [file_or_dir] [file_or_dir] [...]
pytest.py: error: unrecognized arguments: -n
inifile: /Users/davidchudzicki/hypothesis-python/tox.ini
rootdir: /Users/davidchudzicki/hypothesis-python
I think I also would have found this if it existed yet, so that's good too.
guides/testing-hypothesis.rst
Outdated
python -m pytest tests/cover/test_conjecture_engine.py | ||
|
||
You will need to have Hypothesis installed locally to run these. I recommend a | ||
virtualenv where you have run ``pip install -e .``, which installs all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also install the test dependencies? (that is, pytest-xdist
and flaky
)
Right, this has changed enough that I'd like a quick rereview. Main things that are different:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@DRMacIver, are you happy for this to be merged now? I am 😁 |
As part of #1093 I promised to write us an internals guide. This is a hopefully decent start at that!