Migrate `DataTree` to the new IR #3818

tybug · 2023-12-19T02:49:52Z

Another step towards #3086.

depends on Various core and test changes #3835
depends on Test touchups in preparation for ir migration #3844
depends on Improvements to forced and forced testing #3846
depends on More small IR-related improvements #3854
follow up on investigating Various core and test changes #3835 (comment)

Potential follow-up work to this PR:

follow draw distributions only with some meta-probability (Migrate DataTree to the new IR #3818 (comment))
replace sub-ir examples with an example for each ir node type (this is potentially obsoleted by rebuilding the shrinker)
don't write single-valued pseudo choice nodes to the tree (Migrate DataTree to the new IR #3818 (comment))
don't write forced values to the tree (Migrate DataTree to the new IR #3818 (comment))
cache compute_max_children (Migrate DataTree to the new IR #3818 (comment))

Conversations of note:

overapproximating vs underapproximating in compute_max_children (Migrate DataTree to the new IR #3818 (comment))
TooHard (and alternatives) Migrate DataTree to the new IR #3818 (comment)
if observe and forced is None performance comment Migrate DataTree to the new IR #3818 (comment)
profiling results for slowdown on this branch Migrate DataTree to the new IR #3818 (comment)

hypothesis-python/src/hypothesis/internal/conjecture/data.py

hypothesis-python/src/hypothesis/internal/conjecture/datatree.py

hypothesis-python/src/hypothesis/internal/intervalsets.py

hypothesis-python/tests/conjecture/test_data_tree.py

in a previous iteration I did not call kill_branch when discarding. I've since addressed this in (I believe) a more principled way, by removing sub-ir examples and thus discards.

Zac-HD

The approach makes sense, and it looks like a good start!

Due to holiday season I'm going to unsubscribe from notifications on push, so please @-mention me to request review again when that would be useful (or at latest when CI is green) 🚀

hypothesis-python/src/hypothesis/internal/conjecture/data.py

hypothesis-python/src/hypothesis/internal/conjecture/datatree.py

hypothesis-python/src/hypothesis/internal/conjecture/utils.py

hypothesis-python/tests/conjecture/test_data_tree.py

a godsend of a function!

…ced_floats

tybug

@Zac-HD round two! I still have failing tests to address and things to investigate, but I've left some comments I've run into since the last update. Tests are looking a fair bit greener than last time 🙂

hypothesis-python/src/hypothesis/internal/conjecture/datatree.py

hypothesis-python/tests/conjecture/test_float_encoding.py

hypothesis-python/tests/common/debug.py

tybug · 2024-02-01T03:38:23Z

There looks to be some infinite loops (or very slow tests) here 😕. See https://github.com/HypothesisWorks/hypothesis/actions/runs/7733984648/job/21087140454?pr=3818, which I cancelled after 3.5 hours. I'll try to narrow it down locally.

tybug · 2024-02-02T00:31:54Z

tracked the infinite loop down and fixed it: 5058f50

tybug · 2024-02-03T23:12:04Z

Seems to be a remaining flake:

File "/home/runner/work/hypothesis/hypothesis/hypothesis-python/tests/cover/test_error_in_draw.py", line 28, in test_error_is_in_finally
    assert "[0, 1, -1]" in "\n".join(err.value.__notes__)
AssertionError: assert '[0, 1, -1]' in 'Falsifying example: test(\n    d=data(...),\n)\nDraw 1: [1, 0, -20]'
 +  where 'Falsifying example: test(\n    d=data(...),\n)\nDraw 1: [1, 0, -20]' = <built-in method join of str object at 0x7f6f9665fe98>(['Falsifying example: test(\n    d=data(...),\n)', 'Draw 1: [1, 0, -20]'])
 +    where <built-in method join of str object at 0x7f6f9665fe98> = '\n'.join
 +    and   ['Falsifying example: test(\n    d=data(...),\n)', 'Draw 1: [1, 0, -20]'] = ValueError().__notes__
 +      where ValueError() = <ExceptionInfo ValueError() tblen=4>.value

but I ran this locally, n ~= 10k, and it never failed. Maybe my transcription was unfaithful?

There's also this failure, which I don't know how to interpret. Is INTERNALERROR indicative of a pytest-level failure?

Zac-HD

(short review because it's late, forgive terseness - I thought you'd prefer this now than a more effusive review next week)

love it and looking forward to merging real soon now. Some style nitpicks in comments.
I think the INTERNAL ERRORs look like they're transient flakes? Suspicious that it's only on Windows, we might have gotten a bad worker or something. Hopefully vanishes on retry.
From checking CI times this might be a slight slowdown? I think it's acceptable for now to get this in, and we can profile + optimize once we finish the big port - the design is not intrinsically any slower than the status quo (quite the opposite)

hypothesis-python/src/hypothesis/internal/conjecture/datatree.py

tybug · 2024-02-04T23:47:09Z

I appreciate the early review here ❤️. I'll never complain about productive-but-direct reviews 🙂

From checking CI times this might be a slight slowdown? I think it's acceptable for now to get this in, and we can profile + optimize once we finish the big port - the design is not intrinsically any slower than the status quo (quite the opposite)

I noticed this the other day as well. From some light profiling it seems that simulate_test_function is now almost twice as expensive in some cases, due to creating examples while simulating via draw_*. It looks like this is especially impactful in the targeting phase?

Here's some profiling results for test_targeting_increases_max_length. Apologies for cropped screenshots, no reproducing code, etc. this was all a bit ad-hoc.

master	this pr

sub-ir examples go away when we migrate the shrinker, and this doesn't seem debilitating performance wise, so I'm ok with just moving forward here if you are. We could try threading a "dont start or stop examples" var through to draw_* if we wanted a stopgap measure, but I'm only moderately certain doing so wouldn't break things elsewhere (and it would be a bit messy to get right with all the places that need it).

Zac-HD · 2024-02-05T00:16:00Z

Yep, let's just move forward.

I just merged the Black 2024 update so that's the formatting issue, and I think I've worked out the pytest crash too - I'll push a fix for those once the current merge finishes. And then if it's green... 🤞

Zac-HD · 2024-02-05T01:31:13Z

🥳🥳🥳🥳🥳🥳

tybug · 2024-02-05T03:40:03Z

woo!

initial work on datatree ir migration

9b356da

tybug requested a review from Zac-HD as a code owner December 19, 2023 02:49

tybug commented Dec 19, 2023

View reviewed changes

remove outdated comment

cae9b3b

in a previous iteration I did not call kill_branch when discarding. I've since addressed this in (I believe) a more principled way, by removing sub-ir examples and thus discards.

tybug changed the title ~~Migrate DataTree to new IR~~ Migrate DataTree to the new IR Dec 19, 2023

Zac-HD reviewed Dec 19, 2023

View reviewed changes

tybug added 21 commits December 27, 2023 12:26

move choice() method to ConjectureData

b083a4f

fix label name collision

b86ae5b

fix almost all shrinker tests

0095ec0

fix optimiser tests

4ab7c6e

remove test duplicated in test_pareto

9a01468

fix most engine tests

842b732

fix most pareto tests

5c335e1

fix wrong data.choice usage

833c4cd

use existing count_between_floats

23eafff

a godsend of a function!

migrate draw_bits in test_test_data

1958a4c

migrate draw_bits in test_shrinking_dfas

ff0ffe6

remove ConjectureData#write in favor of draw_bytes(forced=...)

98947c4

remove test_draw_write_round_trip. This is better covered by test_for…

d219cbf

…ced_floats

fix test_float_encoding tests via buffers

720c750

more ConjectureData#write / draw_bytes fixes

4b0a103

add DRAW_FLOAT_INNER_LABEL to bring float shrinking back to normal

98c93b3

improve minimal readability with nonlocal

6ed83ae

add and use MAX_CHILDREN_EFFECTIVELY_INFINITE

56d78f1

add cached_property comment

8a5a163

use bit representation of floats for keys

dc95a63

use unweighted sampling if rejection sampling is not making progress

70f9c35

tybug commented Jan 6, 2024

View reviewed changes

tybug added 2 commits January 6, 2024 13:19

more draw_bits -> draw_integer migrations

57fc9ac

avoid 32 bit integers which draws more data

8ea0dae

tybug added 6 commits January 31, 2024 17:39

fix ordering in floats_between

77680cb

add more ir tests

c9eaf6c

no-branch on _draw_from_cache

bce60d5

account for 0 weight integers in children computation

797d67c

split asserts

c8c326a

format

85316f4

tybug added 3 commits February 1, 2024 01:01

deflake test_subtraction_of_intervals

575b53b

fix float key representation in _draw_from_cache

5058f50

add cover test for hard floats

d9b0e0f

deflake test_can_generate_hard_floats

3ff7e49

Zac-HD reviewed Feb 4, 2024

View reviewed changes

tybug and others added 6 commits February 4, 2024 17:45

more consistent conjecture float coverage

a1eb9a9

simpler Literal

db565f3

wording

081e32e

more idiomatic unhandled ir_type error

285bbf9

Merge branch 'master' into datatree-ir

2c16f46

formatting

4d4a32f

Zac-HD approved these changes Feb 5, 2024

View reviewed changes

Zac-HD merged commit 30ded43 into HypothesisWorks:master Feb 5, 2024
48 checks passed

tybug deleted the datatree-ir branch February 5, 2024 03:22

This was referenced Feb 5, 2024

Directly use ConjectureData.draw_boolean() in booleans() #3873

Merged

Random failures with hypothesis.errors.StopTest exception raised #3874

Closed

tybug mentioned this pull request Mar 15, 2024

Migrate our core representation to an IR layer #3921

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate `DataTree` to the new IR #3818

Migrate `DataTree` to the new IR #3818

tybug commented Dec 19, 2023 •

edited

Zac-HD left a comment

tybug left a comment •

edited

tybug commented Feb 1, 2024

tybug commented Feb 2, 2024

tybug commented Feb 3, 2024

Zac-HD left a comment

tybug commented Feb 4, 2024 •

edited

Zac-HD commented Feb 5, 2024

Zac-HD commented Feb 5, 2024

tybug commented Feb 5, 2024

Migrate DataTree to the new IR #3818

Migrate DataTree to the new IR #3818

Conversation

tybug commented Dec 19, 2023 • edited

Zac-HD left a comment

Choose a reason for hiding this comment

tybug left a comment • edited

Choose a reason for hiding this comment

tybug commented Feb 1, 2024

tybug commented Feb 2, 2024

tybug commented Feb 3, 2024

Zac-HD left a comment

Choose a reason for hiding this comment

tybug commented Feb 4, 2024 • edited

Zac-HD commented Feb 5, 2024

Zac-HD commented Feb 5, 2024

tybug commented Feb 5, 2024

Migrate `DataTree` to the new IR #3818

Migrate `DataTree` to the new IR #3818

tybug commented Dec 19, 2023 •

edited

tybug left a comment •

edited

tybug commented Feb 4, 2024 •

edited