Performance degradation with 1.4.0 #166

tmohme · 2021-02-11T14:25:59Z

Using 1.4.0 (as opposed to 1.3.10) we experience a severe performance degradation compared to 1.3.10 for some tests.

The time required to execute (at least some tests) seems to increase super-linear with the number of tries.
According to a first quick glance edgeCases are heavily involved.
I'll try to investigate this further and/or come up with a reproducer in the following days.

One of the affected test classes (containing 17 tests) shows the following behavior:

tries	1.3.10	1.4.0
100	2.5s	4.0s
1000	8s	80s
2000	14s	228s

The text was updated successfully, but these errors were encountered:

jlink · 2021-02-11T14:31:42Z

I've also seen a couple of performance issues today - sadly one day after release :-/

@tmohme Can you share one or two examples that make the problem obvious?

tmohme · 2021-02-12T14:21:21Z

I found a strong correlation of slowness with usage of custom Arbitraries for heavy domain objects.
I'll try to provide one of these . . .

jlink · 2021-02-12T14:48:05Z

@tmohme Do you use combinators in you domain arbitraries? That’s what I found at first glance.

tmohme · 2021-02-12T14:50:21Z

Yes, extensively (at least for these heavy domain objects).

jlink · 2021-02-12T14:51:50Z

@tmohme That gives me some guidance where to look.

tmohme · 2021-02-12T15:10:47Z

@jlink Will you be OK with a Kotlin example?
There's lots of code involved and converting it to Java would take some time . . .

jlink · 2021-02-12T15:35:17Z

@jlink sure

tmohme · 2021-02-12T16:23:04Z

A small gradle project containing a property-test using an arbitrary-provider for a heavy domain object:
jqwik-166.zip

jlink · 2021-02-12T16:55:38Z

@tmohme Thanks!

jlink · 2021-02-13T16:41:43Z

@tmohme Can you try "1.4.1-SNAPSHOT"? It should get rid of the performance degredation; as a matter of fact it should speed up generation compared to 1.3.10 quite a bit. Overall test time of jqwik got down by almost 25%.

Here's the downside: The fix completely breaks the behaviour of Arbitrary.unique() and the @Unique annotation. Both are deprecated anyway but are supposed to stay till 1.5.0. Fixing their behaviour would require a major change in some parts - only to redo everything once they are gone.

It feels like a Catch-22: Either break compatibility despite all the promises I signed with my blood, or leave in the really bad generation performance for composite and flat-map arbitraries, or invest days if not weeks in a deprecated and soon-to-remove functionality.

osi · 2021-02-13T19:43:04Z

@jlink random thought - what if you "faked" the implementation of how unique works in 1.4?

this is probably super-naive, but it i wonder if may work "well enough" to bridge until the feature's official removal?

jlink · 2021-02-14T03:14:15Z

@osi That’s actually a good idea. Let’s see what’s possible with limited effort...

jlink · 2021-02-14T10:43:28Z

@osi Turns out, even a "well enough" implementation requires more effort than I'm willing to invest.

I'll go ahead with a 1.5.0 soonish.

jlink · 2021-02-14T12:31:34Z

@tmohme I ran your example with 1.3.10 and 10000 tries: 6.9 sec. With 1.5.0-SNAPSHOT: 6.8 sec. Seems 1.5.0-SNAPSHOT should have fixed your problem.

So I'll probably go for a 1.5.0 release tomorrow.

tmohme · 2021-02-14T14:26:43Z

@jlink I think there is no need to rash this decision to drop backward compatibility so fast:

I don't know if what we are doing with jqwik is representative or an outlier.
Everyone who is dissatisfied with 1.4.0 can still use 1.3.10 (as we do for now).
Also I have no idea how many installations rely on the unique behavior (for us the adaption was a very minor thing).
Although I guess if your adaptions fix the problem with my example, they will probably also fix our other tests, I can't confirm this for sure until I've tried it (tomorrow).

tmohme · 2021-02-15T12:40:17Z

Just ran some timing tests.
Before each set, I ran mvn test-compile, then multiple mvn surefire:test to get the times for only the actual test execution.
The tests get executed 4-way parallel but not all of the parallel running VMs run for the same duration. The times noted in the table below is for all test to complete, thus dominated by the VM that executed the slowest set of tests.
The cells contain the time to run the tests and the number of tests executed (reported by surefire).

1.3.10	1.4.0	1.4.1-SNAPSHOT	1.5.0-SNAPSHOT (`855a04a`)
1:15 / 479	3:52	3:12 / 476	2:34 / 464
1:18 / 475	2:56	5:17 / 462	2:42 / 456
1:02 / 467	4:30	4:55 / 468	2:41 / 456
1:21 / 456	5:19
1:12 / 457	4:36

I don't have any numbers for executed tests for 1.4.0 because I started the series with 1.4.0 and noticed only after a few runs that the number of executed test vary.
This variation is unexpected and remarkable in itself and I'll try to find the cause.

jlink · 2021-02-15T13:01:01Z

@tmohme Looks like an interesting riddle :-/ There’s obviously more going on...

tmohme · 2021-02-15T17:22:34Z

I'll try to look a bit broader (with gradle) on our tests find out which tests are affected most.
Maybe I find some commonalities and thus another approach for tackling the regression.
Unfortunately I can't work on the problem without interruption.
@jlink Let me know when you want me to do specific tests / comparisons.

jlink · 2021-02-15T18:07:06Z

@tmohme Take your time. I decided not to rush 1.5.0.

tmohme · 2021-02-17T13:52:43Z

I repeated the timing series from above with gradle and a slightly different source state:

All time values were taken from gradle's test report.
All runs reported 476 tests executed :)
The first run (per configuration) was ./gradlew clean test, following runs were ./gradlew test --rerun-tasks, parallel execution and caching were enabled.

edit: new column "1.5.0-SNAPSHOT (81fb971)"
edit: new column "1.5.0-SNAPSHOT (c00d945)"

1.3.10	1.4.0	1.4.1-SNAPSHOT	1.5.0-SNAPSHOT (`855a04a`)	1.5.0-SNAPSHOT (`81fb971`)	1.5.0-SNAPSHOT (`c00d945`)
1:52	6:08	6:12	3:14	3:17	1:58
1:53	6:15	6:02	3:24	3:17	1:51
1:56	6:31	6:38	3:20	3:21	1:49

Next I'll have have a closer look into 1.5.0-SNAPSHOT.

jlink · 2021-02-17T16:36:47Z

@tmohme My guess it that most of the time is spent creating the generator with edge cases.

tmohme · 2021-02-17T17:00:18Z

Investigating all test durations with gradle I found that the slowest test are (as far as I can see) not structurally different to what I provided as a demo I provided you, although they might be even heavier in terms of "number of combined fields".

Internally caching the arbitraries definitely helps a lot. The test-group from which I derived the demo is even fast with with 1.5.0-SNAPSHOT (~6s) than with 1.3.10 (~8s).
Unfortunately my "problem childs" are still 2.5 times slower (~20s vs. ~8s).

tmohme · 2021-02-17T17:21:57Z

A cpu flame graph of the first 60s of execution of one of my slow tests (tries=10000) with 1.5.0-SNAPSHOT:

I guess the column with the hover-note is where the actual test execution happens.
This in turn means 87.55% of the time jqwik is busy with "ResolvingParametersGenerator.next" (43.96% of parent) or "GenericProperty.extractParams" (43.41% of parent).

tmohme · 2021-02-17T17:34:54Z

The same clipping for 1.3.10:

Here the relations are very different: "GenericProperty.testPredicate" is active 52.36% of parent's time while "ResolvingParametersGenerator.next" takes up 28.05% and "GenericProperty.extractParams" uses 19.38%.

tmohme · 2021-02-17T17:38:59Z

Apart from the different relations, it is conspicuous that the call stack seems to be quite a bit deeper with 1.5.0-SNAPSHOT, which could (at least partly) be one reason for the shifted relation.

jlink · 2021-02-17T19:08:45Z

@tmohme Thanks for the detailed analysis. One last question: Is execution time still growing in an overlinear way?

tmohme · 2021-02-17T19:13:57Z

No, it seems to be linear with 1.5.0-SNAPSHOT.
My slow test (group) this afternoon:

tries	duration [s]
250	5.8
500	10.8
1000	20
2000	39

edit:
The group consists of six tests.
The following table contains the absolut duration (in ms) of one try, computed as the average of tests 2..6, leaving out the the first (& slowest) test of each run to discount for cache-warming effects.

tries	1.3.10	1.5.10-SNAPSHOT
250	0.99	2.87
500	1.0	3.2
1000	0.88	2.84
2000	0.84	3.04
4000	0.99	2.87
8000	1.04	3.03

Not all tests are exactly comparable (in terms of parameters and executed logic) and the test execution order varies between runs, but nevertheless it is safe to say that in this setup 1.5.0-SNAPSHOT is slower by a factor of ~3.

jlink · 2021-02-18T15:19:17Z

@tmohme You might want to try the latest 1.5.0-SNAPSHOT (make sure you refresh dependencies). Your demo runs twice as fast as with 1.3.10 here.

tmohme · 2021-02-19T08:58:15Z

I can confirm 1.5.0-SNAPSHOT (c00d945) is way faster than the previously tested versions.
For comparison I added a column to my comment from two days ago.

The execution time now seems to grow sub-linear, converging to linear. Probably because it starts slow, followed by significantly faster executions due to cache hits.

The cpu profiling confirms that now the time spend for actually executing the test functions dominates the preparation part:

edit:
Overall we're now back at the execution-times we had with 1.3.10.
The demo I provided might be especially sensitive to your optimizations - due to the selection or due to its small size.

jlink · 2021-02-19T09:01:38Z

@tmohme I take it you'd be satisfied with the performance as it is?

tmohme · 2021-02-19T09:19:30Z

That's right 👍

jlink · 2021-02-19T10:51:30Z

Using memoization, the performance looks fine in 13e8ed7

jlink closed this as completed Feb 19, 2021

mhyeon-lee mentioned this issue Oct 5, 2021

branch 전략 논의 naver/fixture-monkey#40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation with 1.4.0 #166

Performance degradation with 1.4.0 #166

tmohme commented Feb 11, 2021

jlink commented Feb 11, 2021

tmohme commented Feb 12, 2021

jlink commented Feb 12, 2021

tmohme commented Feb 12, 2021

jlink commented Feb 12, 2021

tmohme commented Feb 12, 2021 •

edited

jlink commented Feb 12, 2021

tmohme commented Feb 12, 2021

jlink commented Feb 12, 2021

jlink commented Feb 13, 2021

osi commented Feb 13, 2021

jlink commented Feb 14, 2021

jlink commented Feb 14, 2021

jlink commented Feb 14, 2021

tmohme commented Feb 14, 2021

tmohme commented Feb 15, 2021

jlink commented Feb 15, 2021

tmohme commented Feb 15, 2021

jlink commented Feb 15, 2021

tmohme commented Feb 17, 2021 •

edited

jlink commented Feb 17, 2021

tmohme commented Feb 17, 2021

tmohme commented Feb 17, 2021

tmohme commented Feb 17, 2021

tmohme commented Feb 17, 2021

jlink commented Feb 17, 2021

tmohme commented Feb 17, 2021 •

edited

jlink commented Feb 18, 2021

tmohme commented Feb 19, 2021 •

edited

jlink commented Feb 19, 2021

tmohme commented Feb 19, 2021

jlink commented Feb 19, 2021

Performance degradation with 1.4.0 #166

Performance degradation with 1.4.0 #166

Comments

tmohme commented Feb 11, 2021

jlink commented Feb 11, 2021

tmohme commented Feb 12, 2021

jlink commented Feb 12, 2021

tmohme commented Feb 12, 2021

jlink commented Feb 12, 2021

tmohme commented Feb 12, 2021 • edited

jlink commented Feb 12, 2021

tmohme commented Feb 12, 2021

jlink commented Feb 12, 2021

jlink commented Feb 13, 2021

osi commented Feb 13, 2021

jlink commented Feb 14, 2021

jlink commented Feb 14, 2021

jlink commented Feb 14, 2021

tmohme commented Feb 14, 2021

tmohme commented Feb 15, 2021

jlink commented Feb 15, 2021

tmohme commented Feb 15, 2021

jlink commented Feb 15, 2021

tmohme commented Feb 17, 2021 • edited

jlink commented Feb 17, 2021

tmohme commented Feb 17, 2021

tmohme commented Feb 17, 2021

tmohme commented Feb 17, 2021

tmohme commented Feb 17, 2021

jlink commented Feb 17, 2021

tmohme commented Feb 17, 2021 • edited

jlink commented Feb 18, 2021

tmohme commented Feb 19, 2021 • edited

jlink commented Feb 19, 2021

tmohme commented Feb 19, 2021

jlink commented Feb 19, 2021

tmohme commented Feb 12, 2021 •

edited

tmohme commented Feb 17, 2021 •

edited

tmohme commented Feb 17, 2021 •

edited

tmohme commented Feb 19, 2021 •

edited