Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive performance drop in test_blacklisted_characters #1864

Closed
Zalathar opened this issue Mar 12, 2019 · 6 comments · Fixed by #2030
Closed

Massive performance drop in test_blacklisted_characters #1864

Zalathar opened this issue Mar 12, 2019 · 6 comments · Fixed by #2030
Labels
performance go faster! use less memory!

Comments

@Zalathar
Copy link
Contributor

While testing some other changes, I stumbled across the fact that test_blacklisted_characters now takes a very long time to run.

This can be seen in https://travis-ci.org/HypothesisWorks/hypothesis/jobs/504749908, and seems to be linked to #1846 in some way.

@Zalathar Zalathar added tests/build/CI about testing or deployment *of* Hypothesis performance go faster! use less memory! labels Mar 12, 2019
@Zalathar
Copy link
Contributor Author

Sadly I haven't had a chance to track this down more precisely, or to figure out whether it's a real bug or a test bug.

Zac-HD added a commit that referenced this issue Mar 13, 2019
See issue #1864; but I'm simply disabling it for now due to eb huge slowdown.
@Zalathar
Copy link
Contributor Author

I've determined that the slowdown is in the assert_no_examples part of the test.

Looking through the --hypothesis-verbosity=debug -s output, it is doing more-or-less what it was doing before: It quickly finds all of the unique output values, and then spends the rest of its exploration budget generating increasingly-elaborate rejection-sampler sequences.

The difference is that instead of finishing in an instant, this process becomes slower and slower (inconsistently) as the test continues to run.

@Zalathar
Copy link
Contributor Author

I traced the problem to generate_novel_prefix in DataTree:

def generate_novel_prefix(self, random):
"""Generate a short random string that (after rewriting) is not
a prefix of any buffer previously added to the tree.
This is logically equivalent to generating the test case uniformly
at random and returning the first point at which we hit unknown
territory, but with an optimisation for the only common case where
that would be inefficient.
"""
assert not self.is_exhausted
initial = self.find_necessary_prefix_for_novelty()
while True:
def draw_bytes(data, n):
i = data.index
if i < len(initial):
return initial[i : i + n]
else:
return uniform(random, n)
data = ConjectureData(draw_bytes=draw_bytes, max_length=float("inf"))
try:
self.simulate_test_function(data)
except PreviouslyUnseenBehaviour:
return hbytes(data.buffer)

As the test continues to run, the while True loop takes more and more iterations to complete on average. It often requires hundreds or thousands of retries before it can find a novel prefix by chance.

The underlying cause is that this novelty-generator has no way to detect and avoid exhausted parts of the tree. So when novel prefixes are rare, it spins over and over until it gets lucky enough to stumble upon one.

@Zalathar
Copy link
Contributor Author

This effect isn't specific to characters. I can reproduce it with this:

def test_integers():
    assert_no_examples(st.integers(0, 5), lambda x: False)

Range sizes that are (2 ** n) - 2 are the most severely affected.

@DRMacIver
Copy link
Member

The underlying cause is that this novelty-generator has no way to detect and avoid exhausted parts of the tree. So when novel prefixes are rare, it spins over and over until it gets lucky enough to stumble upon one.

This isn't quite true. Note that it detects exhausted parts of the tree up until the point where the first possible branch occurs. This means that when there is a long forced prefix it finds that without spinning. I had (apparently mistakenly) assumed that that would be enough and that the cases where there were multiple possible branches to take there but we still got high rejection rates would be relatively uncommon.

I think a fix that is probably sufficient for this case and is equivalent in result to the current is to always have the first non-forced block chosen uniformly at random from the set of available possibilities. Maybe something more sophisticated is needed in the general case.

@Zac-HD Zac-HD removed the tests/build/CI about testing or deployment *of* Hypothesis label Apr 19, 2019
@Zac-HD
Copy link
Member

Zac-HD commented Apr 19, 2019

@pytest.mark.skip was added in 354258d so this isn't killing our CI, but the underlying problem is still there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance go faster! use less memory!
Projects
None yet
3 participants