Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for crosshair backend #3806

Merged
merged 46 commits into from Mar 9, 2024

Conversation

Zac-HD
Copy link
Member

@Zac-HD Zac-HD commented Dec 3, 2023

See #3086 for design and previous discussion.

Ping @pschanely - fyi there are some interface changes from the previous draft, both in how you register the backend and in that we now expect a global context manager which yields a per-test-case-ctx function.

I'm inclined to require #3801; it might slow us down a bit but I think will make converting the failing example to bytes practical, and from there it's a small step to database and shrinking support 😁

@Zac-HD Zac-HD added new-feature entirely novel capabilities or strategies interop how to play nicely with other packages internals Stuff that only Hypothesis devs should ever see labels Dec 3, 2023
@pschanely
Copy link
Contributor

Ping @pschanely - fyi there are some interface changes from the previous draft, both in how you register the backend and in that we now expect a global context manager which yields a per-test-case-ctx function.

No problem - this works; updated on my side.

I'm inclined to require #3801; it might slow us down a bit but I think will make converting the failing example to bytes practical, and from there it's a small step to database and shrinking support 😁

Sounds good. IIUC, I would then, at context manager exit, blast all of my realized primitives into the run's conjecturedata via the forced operations on a regular PrimitiveProvider. Is that right? We might need to be more surgical about what the context manager covers; I noticed last night that ConjectureRunner.test_function seems to want to operate on the buffer pretty immediately after executing the user code.

@Zac-HD
Copy link
Member Author

Zac-HD commented Dec 5, 2023

IIUC, I would then, at context manager exit, blast all of my realized primitives into the run's conjecturedata via the forced operations on a regular PrimitiveProvider. Is that right?

I'm aiming to have a new attribute which is just the list of primitive values provided, so we'd want to materialize the contents of that list (working title self.primitives_prefix in this draft).

Converting to a buffer would I think force realization, so we'd want to delay that until the end of the test, which means we'd actually have to re-run the test... and at that point we might as well do the buffer construction only for failing examples which we pass off to the shrinker or replay of the minimal failure. If we currently realize earlier, we can probably make some changes to defer that; at worst waiting for some more work on the original issue.

@Zac-HD
Copy link
Member Author

Zac-HD commented Dec 11, 2023

@tybug - I've now exceeded my timebox for this, and won't be able to come back to it until mid-January at earliest. If you want to use it as a starting point, you'd be welcome to take it over 🙂

@tybug
Copy link
Member

tybug commented Dec 11, 2023

@Zac-HD roger, thanks! I'll likely end up working on this after the datatree ir migration. (which is slower than I'd hoped, but I am chugging along.)

@Zac-HD
Copy link
Member Author

Zac-HD commented Dec 11, 2023

Makes sense! Good chance this one will be easier once the datatree has native IR support too, I got a bit bogged down in adding special cases where an alternative backend means we have to ignore the usual semantics of an empty buffer.

@tybug tybug mentioned this pull request Jan 22, 2024
5 tasks
@tybug
Copy link
Member

tybug commented Feb 16, 2024

I'm taking a look at getting this working. I've got a branch with full shrinking + database playback support for PrngProvider (but not crosshair; see below): https://github.com/HypothesisWorks/hypothesis/compare/master...tybug:hypothesis:provider-plugins-2?expand=1.

I took a different approach from this branch, where primitives are forced to their buffer representation immediately when drawn. I tried forcing them only at the end in freeze, but I think there are places when the engine wants to access the buffer before a ConjectureData concludes, so I didn't get very far with that.

Even though "normal" backends work with this, I don't know whether this approach is OK for crosshair? It seems like "only convert to buffer at the end" may be a requirement for crosshair given the above conversation. I.e., is it the responsibility of hypothesis (convert at end) or crosshair (use per_test_case context manager) to respect the lifetime of the crosshair primitives? I should disclaim that I have no knowledge of crosshair internals and was only vaguely following the above conversation.

If only converting at the end is a requirement, it'll be a bit more work to change things in hypothesis to account for that (unsure how much yet.)

@pschanely, on a related note, I tried running the crosshair plugin with my above branch, but got:

@settings(backend="crosshair", suppress_health_check=HealthCheck.all())
@given(st.integers())
def test(value):
    print("called", value)
    assert value != 150

test()

Traceback (most recent call last):
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/sandbox.py", line 20, in <module>
    test()
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/sandbox.py", line 15, in test
    @given(st.integers())
                   ^^^
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/core.py", line 1620, in wrapped_test
    raise the_error_hypothesis_found
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/core.py", line 1587, in wrapped_test
    state.run_engine()
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/core.py", line 1115, in run_engine
    runner.run()
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 500, in run
    self._run()
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 917, in _run
    self.generate_new_examples()
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 638, in generate_new_examples
    zero_data = self.cached_test_function(bytes(BUFFER_SIZE))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/engine.py", line 1089, in cached_test_function
    self.tree.simulate_test_function(dummy_data)
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/datatree.py", line 727, in simulate_test_function
    v = draw(
        ^^^^^
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/datatree.py", line 716, in draw
    value = draw_func(**kwargs, forced=forced)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tybug/Desktop/Liam/coding/hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/data.py", line 1768, in draw_integer
    value = self.provider.draw_integer(**kwargs, forced=forced)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/hypothesis_crosshair_provider/crosshair_provider.py", line 104, in draw_integer
    symbolic = proxy_for_type(int, self._next_name("int"), allow_subtypes=False)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/crosshair/core.py", line 586, in proxy_for_type
    space = context_statespace()
            ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/crosshair/statespace.py", line 243, in context_statespace
    raise CrosshairInternal
  File "/opt/homebrew/lib/python3.12/site-packages/crosshair/util.py", line 594, in __init__
    debug("CrosshairInternal", str(self))
                               ^^^^^^^^^
RecursionError: maximum recursion depth exceeded while getting the str of an object

(note the, I believe, real error of CrosshairInternal is hidden by a recursion error red herring). Just curious if this is expected or I've broken things in my provider-plugins version.

@Zac-HD
Copy link
Member Author

Zac-HD commented Feb 16, 2024

Unfortunately yeah, for crosshair we have to convert only at the end. Converting while the test is in progress forces crosshair to choose a concrete value for that element, and thus give up the main benefit of concolic execution, earlier than would otherwise be the case.

I think we can get around this reasonably well with only one additional epicycle: add a .from_ir(seq) method to PrimitiveProvider where it records that sequence and replays from it in the same style as ConjectureData.from_buffer(buf). Then to replay, we add one additional execution where we materialize the crosshair choice sequence at the end, and finally push that through the convert-as-we-go logic in your branch.

Happily that'll go away again when we convert the shrinker and database format to IR-native. I guess the places where we try to access an unfinished buffer might turn out to be a problem them, but let's not borrow trouble - they might go away in the refactoring.

@pschanely
Copy link
Contributor

Yup, Zac's on the ball; draw-time will be too early for CrossHair.

And @tybug thanks - it's not you. Looks like I aleady broke my plugin already with mainline changes. I've got a suspicion about that crash; will be able to investigate this weekend and report back.

@pschanely
Copy link
Contributor

@tybug I hooked up tybug/provider-plugins-2 to the latest versions of CrossHair and crosshair-hypothesis; and made some additional changes to both. (please pull latest)

Some of those updates will hopefully reduce some of the confusion around error reporting. (though expect this to be challenging in general - whenever either Hypothesis or Pytest tries to do some exception handling involving symbolics, things will generally go haywire)

Going forward, I'd call out some likely kinds of exceptions:

  • CrossHairInternal @ statespace.py:243 - if this happens, we're likely attempting to use a symbolic value outside the per-run context manager.
  • NotDeterministic - CrossHair expects, for each run, symbolics will be accessed in the same manner. When they aren't, you'll see this exception. (which may then likely trigger hypothesis's similar Flakey error) An easy failure scenario here is when hypothesis is caching something which causes the symbolics to be used or accessed differently on a subsequent iteration. In my recent edits, I've special-cased my plugin to ignore runs where the symbolics aren't accessed at all, which seems to be somewhat common, at least in the current implementation.

In particular, right now, I see the following immediate issues, though I expect they might just go away with the additional changes you're planning:

  • Hypothesis attempts a run with a zero data buffer here, which uses the crosshair provider, but I believe is not covered by the context manager.
  • NotDeterministic gets raised when the TreeRecordingObserver attempts to lookup the current symbolic in a dictionary, here.

At any rate, I'm happy to continue to diagnose issues as you encounter them. Don't be shy!

@tybug
Copy link
Member

tybug commented Feb 21, 2024

Thanks @pschanely! Especially for the elaboration on the exceptions. Hypothesis' usage of DataTree is the result of both of the current crosshair-hypothesis errors mentioned above, due to DataTree's fresh ConjectureData not using the context manager and premature reification respectively.

I'm thinking about how this pull will play with other backends going forward. Crosshair seems like roughly the most complicated backend we would ever want to support in terms of the requirements it places on how we interact with its provider. Crosshair values can't be reified until the end of a test run, so it seems we have to avoid using DataTree at all here, which tracks what values hypothesis has already tried in order to avoid duplication. In fact, crosshair may effectively reimplement DataTree with its NotDeterministic (flaky) and try-something-new (deduplication) logic.

But other backends may not have this restriction and would benefit from DataTree, lest they produce duplicate inputs. I'd think/hope this is the default mode of execution, and a backend can opt out of using DataTree? @Zac-HD do you have thoughts about how this opting-out might look from the backend's perspective? We could have a wants_datatree = True class-level attr on PrimitiveProvider which CrosshairProvider overrides to False.

I know the stated goal is to get the crosshair backend working, and I don't mean to create extra work here 🙂. I'm personally invested in this because I have ideas for other backends (TargetedProvider) which do benefit from the assurances provided by DataTree. And it's probably good to get at least the forward direction hashed out early, even if we defer some things to later.

@Zac-HD
Copy link
Member Author

Zac-HD commented Feb 21, 2024

I've called out crosshair in the PR title, but mostly because I figured if we had that working everything else would work too! Specifically, I think we should work out a design that can support each of crosshair, atheris, hypofuzz, targeted-PBT, and of course hypothesis' own default provider.

It seems to me that the basic split is between backends which do their own tracking of what's happened - whether fuzzing or solving - and those which don't. Having an attribute on PrimitiveProvider which determines whether we use DataTree seems pretty reasonable to me; I think we can get away with a boolean and naming it by function (though no specific suggestions on that yet) rather than e.g. a black/grey/white-box enum or something. wants_datatree= is OK but maybe a bit too specific?

Seems like we're pretty close to something which kinda-works though, if we thread through the context manager and manage to defer reification I think that might actually work?

If you think of a subset that would make sense to ship sooner, I'd also be happy to release an explicitly-unstable feature with e.g. Atheris integration... although that involves arguing over who controls the main loop, so maybe not.

@tybug
Copy link
Member

tybug commented Feb 21, 2024

I'm also hopeful we're almost there. I'll continue working here and try to get something which works for both crosshair and simpler backends. (thanks for your fixes above, by the way — merge gone wrong on my end).

@tybug

This comment was marked as outdated.

@tybug
Copy link
Member

tybug commented Feb 23, 2024

OK, sorry, my print is at fault above; that's triggering reification outside the context manager. There is an issue here somewhere, because I only added that print to debug the buffer being empty when it shouldn't be, but what I posted above isn't it.

@tybug
Copy link
Member

tybug commented Feb 23, 2024

Here's the real problem: crosshair raises NotDeterministic when converting from ir to bits. The flow looks something like this:

  • we're testing @given(integers()) def f(n): assert n != 12345
  • crosshair provides a symbolic integer n
  • eventually (or maybe even on the first iteration?) crosshair reifies n as 12345 at n != 12345
  • we've found a bug! we hand back to hypothesis, which has tracked the symbolic n inside ir_tree
  • hypothesis pushes the ir tree through PrimitiveProvider to convert to bits...
  • ...but it has the symbolic n. When it tries to use it via forced=n, it hits a different constraint than n != 12345, because we're not passing it to the property anymore — we're using it to produce a bitstream value.
    • in practice, this is something like using n.bit_count() while converting to bits.

I think crosshair is correct to raise NotDeterministic here, but I think we also explicitly want to use n in a nondeterministic way? Some bad ideas: (1) crosshair walks the ir tree and replaces stored symbolics with reified values (when? also very fragile) (2) we tell crosshair when a test is finished and it disables NonDeterministic checking (3) some secret third thing.

By the way, reproducing this requires some work, because hypothesis swallows the NotDeterministic exception and produces its own Flaky. Details in case it's useful.

  • set a breakpoint at self.__stoppable_test_function(data)
  • at the breakpoint, enable "raised exceptions"
  • resume debugging

it will break at the first usage of the symbolic int.

image

@Zac-HD
Copy link
Member Author

Zac-HD commented Feb 23, 2024

I think the problem is that we shouldn't track symbolic values in our DataTree - if we just appended the symbolic values to a list, then asked crosshair to reify them at the end of the test, and then added them to the tree... would that work?

@tybug
Copy link
Member

tybug commented Feb 23, 2024

what would "ask crosshair to reify them" look like? We basically are appending the symbolic values to a list by tracking them in ir_tree, and we basically are asking crosshair to reify them by passing them as forced=n. That reification just happens to take what I assume is the form of "give me a new value for this symbolic, and by the way, we're using it in a different context than past iterations."

If there was a magic function from crosshair we could call, where we give it a symbolic value and it returns the most-recently-reified (?) value for that symbolic, then I think that may work? But that's quite a tight coupling for what is supposedly a backend.

@tybug tybug marked this pull request as ready for review March 5, 2024 20:06
@pschanely
Copy link
Contributor

I tried making some local changes to crosshair to test it with the above changes, but it's Nondeterministic again. (same 12345 reproducer as before, though be sure to use database=None. breakpoint on space.detach_path()). This could either be my crosshair provider changes being incorrect or I've broken something in hypothesis itself. @pschanely ideas would be appreciated 🙏

I've finally got some time! Investigating now; please hold.

@pschanely
Copy link
Contributor

I tried making some local changes to crosshair to test it with the above changes, but it's Nondeterministic again. (same 12345 reproducer as before, though be sure to use database=None. breakpoint on space.detach_path()). This could either be my crosshair provider changes being incorrect or I've broken something in hypothesis itself. @pschanely ideas would be appreciated 🙏

Ok! I made another change on top of @tybug's changes (there was some other state on the provider that needed to be reset now that's it's persistent). And, things are looking good. I've even gotten a little saucy and ventured into non-primitives. Lists work. It's ineffective at dictionaries - something about how hypothesis is ensuring uniqueness is confounding crosshair, but I'm confident this is resolvable.

@tybug
Copy link
Member

tybug commented Mar 6, 2024

aha, nice!

@Zac-HD
Copy link
Member Author

Zac-HD commented Mar 6, 2024

Oh wow, I am so excited right now. Does it sound plausible that we could squash/rebase for a clean history, write some more tests, and release our clearly-marked-experimental version over the weekend?

@tybug
Copy link
Member

tybug commented Mar 7, 2024

That sounds good to me. I think if we want to explicitly test against CrosshairProvider in hypothesis, we'll need https://github.com/pschanely/hypothesis-crosshair to be pushed to pypi — unless it already is, and I've missed it.

@pschanely
Copy link
Contributor

That sounds good to me. I think if we want to explicitly test against CrosshairProvider in hypothesis, we'll need https://github.com/pschanely/hypothesis-crosshair to be pushed to pypi — unless it already is, and I've missed it.

I hadn't. But just published something. All that said, I'm not sure we should run crosshair in hypothesis's tests yet. It still does some destructive things to the interpreter, and I am worried they'll cause you trouble ... if not now, sometime down the line. I don't know how feasible it would be to write a dummy provider that would attempt to ensure hypothesis continues to meet the expectations we've placed on it. Particularly tricky might be to ensure hypothesis doesn't access values without using the reification hook.

At any rate, why don't I add the real integration tests on the plugin side ... and I'll set up a nightly build against hypothesis or something.

@Zac-HD
Copy link
Member Author

Zac-HD commented Mar 7, 2024

We can run your integration tests as a not-required-to-pass CI job too, so we'll find out before we ship a breaking change. To be clear we might well do so anyway, I just dislike accidentally breaking things.

@pschanely
Copy link
Contributor

Well, now I'm getting excited too. (though this weekend sounds ... ambitious to me!)

I just cut CrossHair v0.0.50, with which we can generate hypothesis sets and dictionaries ~reasonably well.

Though, I am noticing an assertion failure for node.ir_type == ir_type (only with sets and dicts, seemingly) after crosshair finds the counterexample. Here is an example input+output. @tybug - I'm hoping you have a suggestion? I can see that hypothesis is executing the reification callback appropriately for each of the draws that the provider produced.

@tybug
Copy link
Member

tybug commented Mar 8, 2024

thanks, also reproduces with:

@settings(backend="crosshair")
@given(st.sets(st.integers()))
def test(d):
    assert d != {42}

Cause is

# This is to guard against the case where we consume no data.
# As long as we consume data, we'll eventually pass or raise.
# But if we don't this could be an infinite loop.
assume(data.index > start_index)

this is actually a bug in non-hypothesis backends that is surfacing in the mentioned assertion. We're checking data.index, which is only written to by HypothesisProvider, when we should be checking something backend-agnostic like the IR node list. non-hypothesis backends have been bailing on the first filtered draw until now.

Will attempt a fix tomorrow. (if you want a temporary hacky local fix, commenting that line should work unless you try some truly weird strategies.)

I'm not sure this was doing anything since the buffer should be empty always
@tybug
Copy link
Member

tybug commented Mar 9, 2024

I don't have a clean resolution that preserves the condition, but I'm relatively confident we can safely remove it. Blame points to a revision where the loop was infinite, whereas we now limit to 3 iterations.

Tested with:

@st.composite
def nones(draw):
    return draw(st.just(None))

b = False
@given(nones().filter(lambda x: b))
def f(n):
    pass
f()

(intentionally convoluted to trick our various intelligent rewriting rules).

@Zac-HD
Copy link
Member Author

Zac-HD commented Mar 9, 2024

Well... the implementation looks good, and I've just tried it out:

  1. pip install hypothesis[crosshair] to get all the dependencies
  2. test passes with backend="hypothesis"
  3. test fails with backend="crosshair" 🎉, shrinking works 🎉
  4. back to backend="hypothesis", test replays failing example from the database 🎉

so: anything left to do before we ship?

@tybug
Copy link
Member

tybug commented Mar 9, 2024

Read through the diff again, and I'm happy with it! I'm afraid I'm not comfortable/efficient enough with squashing commits yet for that to be an efficient use of time, but no offense taken if someone else wants to rewrite history. I'll try to squash commits as I go in the future instead of pushing silly fixup commits.

Also, very excited! Setting crosshair loose on tests and watching it find the needle in the haystack counterexample is super satisfying.

@Zac-HD
Copy link
Member Author

Zac-HD commented Mar 9, 2024

OK - let's ship!

I'm pretty sure the journey isn't over yet, but thanks so much to @tybug and @pschanely - I think this is one of the coolest things we've shipped in literally years, and I couldn't have done it without you 🥰

@Zac-HD Zac-HD merged commit a41036b into HypothesisWorks:master Mar 9, 2024
48 checks passed
@Zac-HD Zac-HD deleted the provider-plugins branch March 9, 2024 20:22
@pschanely
Copy link
Contributor

I'm pretty sure the journey isn't over yet, but thanks so much to @tybug and @pschanely - I think this is one of the coolest things we've shipped in literally years, and I couldn't have done it without you 🥰

Well, even if I'm only starting the journey, I'm glad it's this one. Thank you both SO much.

@tybug tybug changed the title WIP: support for crosshair backend Add support for crosshair backend Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internals Stuff that only Hypothesis devs should ever see interop how to play nicely with other packages new-feature entirely novel capabilities or strategies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants