Add a property-based test using Hypothesis & Hypothesmith #1566

Zac-HD · 2020-07-30T12:51:44Z

At PyCon 2019, I gave a few talks about property-based testing, and also spoke to @ambv about generating test inputs for Black. There's a long history of this working really well for compilers (e.g. CSmith), and for formatting tools like prettier/prettier#371.

So starting at the sprints I wrote Hypothesmith to generate Python source code, reported a bunch of bugs, and eventually after some more upgrades thought that it might be useful to test black in its own test suite as well as my downstream code. So here we are!

This test will fail due to #970, #1012, #1557, and possibly other bugs. While Hypothesis will present (mostly) minimal failing inputs and can report multiple bugs based on exception type and location, you can expect to see mostly trivial or who-would-ever-write-that failures for a while. However, fixing those edge cases will allow Hypothesmith to explore elsewhere and find more subtle bugs.

JelleZijlstra · 2020-07-30T16:16:46Z

This is a great tool, thank you!

I'm not sure a normal unit test is a good place for this sort of test though. For one, it's currently failing, so we have to first fix the bugs you linked. As I understand it, it's also partly random, which is great for finding new edge cases, but means that people submitting PRs or running tests locally may see tests failing randomly because we happened to find a previously undetected bug.

Zac-HD · 2020-07-31T04:55:12Z

Glad you like it! There are definitely some questions about how you want to use it, but I thought that opening a PR would give us a more concrete starting point for discussion than an issue 😄

For one, it's currently failing, so we have to first fix the bugs you linked.

Yeah... the obvious options are (1) don't merge it yet, (2) @unittest.expectedFailure, or (3) give known bugs a distinct exception type which can be skipped within the test. Order of increasing work and also order of increasing chance to find other bugs, for what that's worth.

It's also partly random, which is great for finding new edge cases, but means that people submitting PRs or running tests locally may see tests failing randomly because we happened to find a previously undetected bug.

Using the derandomize=True setting would guarantee non-flakiness... it's even possible to use a settings profile to make it quick-and-deterministic or run many random examples depending on e.g. an environment variable. Two kinds of testing describes this pretty nicely.

'Sleeper' test failures are pretty rare though - upstream in Hypothesis we might have two or three per year over a few thousand tests. Once property-based tests are passing they don't tend to discover more failures, but you can easily run them in a loop overnight if that's a concern and better tooling for this is planned.

ichard26 · 2020-08-02T04:23:32Z

Thanks for the PR! I recently was introduced to Hypothesis and Hypothesmith while contributing to flake8-bugbear and I really like it! I do have a bit of concerns though, here's my two cents FWIW.

As a regular contributor, I second JelleZijlstra's slight apprehension about having Hypothesmith as a normal unit test. Since the flow we somewhat document on creating a new PR on the Black repository includes running all unit tests, quite a few PR contributors would hit the Hypothesmith test failing (from looking how many build jobs failed on this PR). Tests are usually designed to either make sure something works as expected or a bug wasn't reintroduced, i.e. making sure stuff still works. While Hypothesmith pushes the code further and further so it can find unseen bugs.

Regular unit tests and Hypothesmith tests are similar, but have different enough goals that they have different appropriate reactions to when they fail. This would add more complexity for a new contributor or a drive-by minor contributor since they might not be familiar enough with project and/or Hypothesmith to know/understand what to do next. Experienced contributor can deal with this new test once they quickly get up to speed with it, but means contributing to Black has a bit more friction. Although, good documentation on this special test in Black's CONTRIBUTING.md could go a long way on making this easier for newer contributor.

Another problem for new / newer contributor is that even if they submit a simple typo fix PR fully through the web interface, there is a non-small chance that the checks will fail due to luck. And if they aren't familiar with Hypothesmith or CI in general, they will probably end up confused. A red X isn't nice to see, and definitely not for newer contributors. Not the greatest experience for someone joining Black, the PSF org, or even open source (and on this front, Black can definitely do better).

TL;DR is that this addition complicates the path of contributing to Black, mostly for newer and less experienced contributors.

And no, I still want to see Hypothesis and Hypothesmith bring better testing to Black, but as you have pointed out, this is merely the start of a discussion :)

Zac-HD · 2020-08-02T08:50:24Z

Thanks for your comment @ichard26! I think you're suggesting that we

make this test deterministic with the derandomize=True setting
and/or move it out of the main test suite

which makes sense to me.

cooperlees · 2020-08-08T14:19:50Z

Thanks @Zac-HD! +1 on separate test here. Maybe only on Ubuntu too and I’ll be happy to merge!

Zac-HD · 2020-08-20T14:25:46Z

@cooperlees - how will you want to run this? A new tests/test_fuzz.py using unittest, a standalone script to run with python fuzztests.py, or what?

cooperlees · 2020-08-20T14:57:42Z

@cooperlees - how will you want to run this? A new tests/test_fuzz.py using unittest, a standalone script to run with python fuzztests.py, or what?

I feel a new fuzz_tests.yml GitHub Action workflow only on ubuntu unless you see value testing in different OS's was what I envisioned

cp .github/workflows/test.yml .github/workflows/fuzz.yml
edit
Win

Maybe even run it with coverage to see how much of the code the fuxx tests hit for some signal too?

coverage run fuzz.py
coverage report

https://github.com/psf/black/blob/master/.github/workflows/test.yml#L28

cooperlees · 2020-08-21T14:31:27Z

This looks great, will merge if I can rebase after @ambv’s large magic trailing comma rewrite!

ambv · 2020-08-21T14:47:58Z

Thanks! ✨ 🍰 ✨

timothycrosley mentioned this pull request Jul 31, 2020

hypothesmith powered testing for isort Zac-HD/hypothesmith#8

Closed

Zac-HD force-pushed the property-tests branch 2 times, most recently from 15307de to 0dc7694 Compare August 21, 2020 11:05

Property-based fuzz test

636adc5

Zac-HD force-pushed the property-tests branch from 0dc7694 to 636adc5 Compare August 21, 2020 11:09

ambv merged commit cd3a93a into psf:master Aug 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a property-based test using Hypothesis & Hypothesmith #1566

Add a property-based test using Hypothesis & Hypothesmith #1566

Zac-HD commented Jul 30, 2020

JelleZijlstra commented Jul 30, 2020

Zac-HD commented Jul 31, 2020

ichard26 commented Aug 2, 2020 •

edited

Zac-HD commented Aug 2, 2020

cooperlees commented Aug 8, 2020

Zac-HD commented Aug 20, 2020

cooperlees commented Aug 20, 2020

cooperlees commented Aug 21, 2020

ambv commented Aug 21, 2020

Add a property-based test using Hypothesis & Hypothesmith #1566

Add a property-based test using Hypothesis & Hypothesmith #1566

Conversation

Zac-HD commented Jul 30, 2020

JelleZijlstra commented Jul 30, 2020

Zac-HD commented Jul 31, 2020

ichard26 commented Aug 2, 2020 • edited

Zac-HD commented Aug 2, 2020

cooperlees commented Aug 8, 2020

Zac-HD commented Aug 20, 2020

cooperlees commented Aug 20, 2020

cooperlees commented Aug 21, 2020

ambv commented Aug 21, 2020

ichard26 commented Aug 2, 2020 •

edited