Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrutineer: integrating opportunistic fault-localisation with PBT #2859

Merged
merged 1 commit into from
Mar 7, 2021

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Feb 16, 2021

I'm currently working on a paper about integrating fault localisation with property-based testing - TLDR, it's pretty easy for PBT libraries to suggest where to start debugging test failures.

It turns out that my simple baseline version is also reliable, useful, and easy to interpret. On that basis I thought it makes sense to ship it, albeit disabled-by-default. I've tried this on a variety of toy examples and some real bugs in black; reviews and/or feedback on how this works for your problems would be most welcome 🙂


Future plans for other PRs: adding fancier techniques, if they're reliable and fast and actually work better; and integration with generalised examples (#2192). As currently planned, all of these would go in a single "explain phase" - they share a purpose and workflow and therefore aren't individually configurable.

@Zac-HD Zac-HD added the new-feature entirely novel capabilities or strategies label Feb 16, 2021
@Zac-HD Zac-HD force-pushed the scrutineer-poc branch 2 times, most recently from e853b0c to 64dbcc8 Compare February 16, 2021 11:36
Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature looks amazing! I would love to try it, but I don't have any failing tests at hand 😆

@@ -71,7 +74,7 @@ with each phase corresponding to a value on the :class:`~hypothesis.Phase` enum:
3. ``Phase.generate`` controls whether new examples will be generated.
4. ``Phase.target`` controls whether examples will be mutated for targeting.
5. ``Phase.shrink`` controls whether examples will be shrunk.

6. ``Phase.explain`` controls whether Hypothesis attempts to explain test failures.
Copy link
Member

@sobolevn sobolevn Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see some examples of how explain results look like somewhere in the docs. Does it fit there?

Copy link
Member Author

@Zac-HD Zac-HD Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on showing it off in a blog post - there's nowhere obvious to put it in the docs, and the exact output format is pretty simple:

from hypothesis import Phase, given, strategies as st

@given(st.integers())
def test_reports_branch_in_test(x):
    if x > 10:
        raise AssertionError  # BUG

(Obviously this is a toy example, but it's been useful on real projects too)

_________________________ test_reports_branch_in_test _________________________
Traceback (most recent call last):
  ...
AssertionError
--------------------------------- Hypothesis ----------------------------------
Falsifying example: test_reports_branch_in_test(
    x=11,
)
Explanation:
    These lines were always and only run by failing examples:
        /path/to/test_file.py:6

One consideration in "what do we report" is that this format (usually) allows you to click on terminal output and have the relevant file open to that line in your preferred editor.

The alternative approach of reporting branches (source/destination pairs of lines) is only rarely more precise in practice, and much more difficult to explain to non-expert users. "report branches if there are no reportable lines" would be a nice trick to explore in future, though.

@Zac-HD Zac-HD force-pushed the scrutineer-poc branch 5 times, most recently from e4868af to 136b1f3 Compare February 17, 2021 07:54
assert len(expected) == code.count(BUG_MARKER)
print(pytest_stdout)
for report in expected:
assert report in pytest_stdout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe https://github.com/syrusakbary/snapshottest will be a good fit here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is great for testing the output! ⭐

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch is that we only want to test parts of the output, i.e. the explanation but not anything about the actual file paths.

(The other catch is that at present explain mode skips the tracing if sys.gettrace() is not None... which makes it compatible with debuggers and also hides it from coverage. Hmmm.)

@Zac-HD
Copy link
Member Author

Zac-HD commented Feb 17, 2021

Ugh. The actual implementation of C trace-functions is tricky enough that I think we actually can't reliably swap in the explain-tracer for e.g. the coverage tracer, not least due to decade-old CPython issues.

@Zac-HD Zac-HD force-pushed the scrutineer-poc branch 2 times, most recently from 56b4404 to a895b9e Compare February 22, 2021 00:57
Being a basic-but-useful system for fault localisation.
@Zac-HD
Copy link
Member Author

Zac-HD commented Mar 4, 2021

@HypothesisWorks/hypothesis-python-contributors - final call for review! I'd love another set of eyes on this and an approving review, but I'll eventually merge it anyway if there are no objections 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature entirely novel capabilities or strategies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants