Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation with 1.4.0 #166

Closed
tmohme opened this issue Feb 11, 2021 · 32 comments
Closed

Performance degradation with 1.4.0 #166

tmohme opened this issue Feb 11, 2021 · 32 comments

Comments

@tmohme
Copy link
Contributor

tmohme commented Feb 11, 2021

Using 1.4.0 (as opposed to 1.3.10) we experience a severe performance degradation compared to 1.3.10 for some tests.

The time required to execute (at least some tests) seems to increase super-linear with the number of tries.
According to a first quick glance edgeCases are heavily involved.
I'll try to investigate this further and/or come up with a reproducer in the following days.

One of the affected test classes (containing 17 tests) shows the following behavior:

tries 1.3.10 1.4.0
100 2.5s 4.0s
1000 8s 80s
2000 14s 228s
@jlink
Copy link
Collaborator

jlink commented Feb 11, 2021

I've also seen a couple of performance issues today - sadly one day after release :-/

@tmohme Can you share one or two examples that make the problem obvious?

@tmohme
Copy link
Contributor Author

tmohme commented Feb 12, 2021

I found a strong correlation of slowness with usage of custom Arbitraries for heavy domain objects.
I'll try to provide one of these . . .

@jlink
Copy link
Collaborator

jlink commented Feb 12, 2021

@tmohme Do you use combinators in you domain arbitraries? That’s what I found at first glance.

@tmohme
Copy link
Contributor Author

tmohme commented Feb 12, 2021

Yes, extensively (at least for these heavy domain objects).

@jlink
Copy link
Collaborator

jlink commented Feb 12, 2021

@tmohme That gives me some guidance where to look.

@tmohme
Copy link
Contributor Author

tmohme commented Feb 12, 2021

@jlink Will you be OK with a Kotlin example?
There's lots of code involved and converting it to Java would take some time . . .

@jlink
Copy link
Collaborator

jlink commented Feb 12, 2021

@jlink sure

@tmohme
Copy link
Contributor Author

tmohme commented Feb 12, 2021

A small gradle project containing a property-test using an arbitrary-provider for a heavy domain object:
jqwik-166.zip

@jlink
Copy link
Collaborator

jlink commented Feb 12, 2021

@tmohme Thanks!

@jlink
Copy link
Collaborator

jlink commented Feb 13, 2021

@tmohme Can you try "1.4.1-SNAPSHOT"? It should get rid of the performance degredation; as a matter of fact it should speed up generation compared to 1.3.10 quite a bit. Overall test time of jqwik got down by almost 25%.

Here's the downside: The fix completely breaks the behaviour of Arbitrary.unique() and the @Unique annotation. Both are deprecated anyway but are supposed to stay till 1.5.0. Fixing their behaviour would require a major change in some parts - only to redo everything once they are gone.

It feels like a Catch-22: Either break compatibility despite all the promises I signed with my blood, or leave in the really bad generation performance for composite and flat-map arbitraries, or invest days if not weeks in a deprecated and soon-to-remove functionality.

@osi
Copy link
Contributor

osi commented Feb 13, 2021

@jlink random thought - what if you "faked" the implementation of how unique works in 1.4?

this is probably super-naive, but it i wonder if may work "well enough" to bridge until the feature's official removal?

@jlink
Copy link
Collaborator

jlink commented Feb 14, 2021

@osi That’s actually a good idea. Let’s see what’s possible with limited effort...

@jlink
Copy link
Collaborator

jlink commented Feb 14, 2021

@osi Turns out, even a "well enough" implementation requires more effort than I'm willing to invest.

I'll go ahead with a 1.5.0 soonish.

@jlink
Copy link
Collaborator

jlink commented Feb 14, 2021

@tmohme I ran your example with 1.3.10 and 10000 tries: 6.9 sec. With 1.5.0-SNAPSHOT: 6.8 sec. Seems 1.5.0-SNAPSHOT should have fixed your problem.

So I'll probably go for a 1.5.0 release tomorrow.

@tmohme
Copy link
Contributor Author

tmohme commented Feb 14, 2021

@jlink I think there is no need to rash this decision to drop backward compatibility so fast:

  • I don't know if what we are doing with jqwik is representative or an outlier.
  • Everyone who is dissatisfied with 1.4.0 can still use 1.3.10 (as we do for now).
  • Also I have no idea how many installations rely on the unique behavior (for us the adaption was a very minor thing).
  • Although I guess if your adaptions fix the problem with my example, they will probably also fix our other tests, I can't confirm this for sure until I've tried it (tomorrow).

@tmohme
Copy link
Contributor Author

tmohme commented Feb 15, 2021

Just ran some timing tests.
Before each set, I ran mvn test-compile, then multiple mvn surefire:test to get the times for only the actual test execution.
The tests get executed 4-way parallel but not all of the parallel running VMs run for the same duration. The times noted in the table below is for all test to complete, thus dominated by the VM that executed the slowest set of tests.
The cells contain the time to run the tests and the number of tests executed (reported by surefire).

1.3.10 1.4.0 1.4.1-SNAPSHOT 1.5.0-SNAPSHOT (855a04a)
1:15 / 479 3:52 3:12 / 476 2:34 / 464
1:18 / 475 2:56 5:17 / 462 2:42 / 456
1:02 / 467 4:30 4:55 / 468 2:41 / 456
1:21 / 456 5:19
1:12 / 457 4:36

I don't have any numbers for executed tests for 1.4.0 because I started the series with 1.4.0 and noticed only after a few runs that the number of executed test vary.
This variation is unexpected and remarkable in itself and I'll try to find the cause.

@jlink
Copy link
Collaborator

jlink commented Feb 15, 2021

@tmohme Looks like an interesting riddle :-/ There’s obviously more going on...

@tmohme
Copy link
Contributor Author

tmohme commented Feb 15, 2021

I'll try to look a bit broader (with gradle) on our tests find out which tests are affected most.
Maybe I find some commonalities and thus another approach for tackling the regression.
Unfortunately I can't work on the problem without interruption.
@jlink Let me know when you want me to do specific tests / comparisons.

@jlink
Copy link
Collaborator

jlink commented Feb 15, 2021

@tmohme Take your time. I decided not to rush 1.5.0.

@tmohme
Copy link
Contributor Author

tmohme commented Feb 17, 2021

I repeated the timing series from above with gradle and a slightly different source state:

  • All time values were taken from gradle's test report.
  • All runs reported 476 tests executed :)
  • The first run (per configuration) was ./gradlew clean test, following runs were ./gradlew test --rerun-tasks, parallel execution and caching were enabled.

edit: new column "1.5.0-SNAPSHOT (81fb971)"
edit: new column "1.5.0-SNAPSHOT (c00d945)"

1.3.10 1.4.0 1.4.1-SNAPSHOT 1.5.0-SNAPSHOT (855a04a) 1.5.0-SNAPSHOT (81fb971) 1.5.0-SNAPSHOT (c00d945)
1:52 6:08 6:12 3:14 3:17 1:58
1:53 6:15 6:02 3:24 3:17 1:51
1:56 6:31 6:38  3:20 3:21 1:49

Next I'll have have a closer look into 1.5.0-SNAPSHOT.

@jlink
Copy link
Collaborator

jlink commented Feb 17, 2021

@tmohme My guess it that most of the time is spent creating the generator with edge cases.

@tmohme
Copy link
Contributor Author

tmohme commented Feb 17, 2021

Investigating all test durations with gradle I found that the slowest test are (as far as I can see) not structurally different to what I provided as a demo I provided you, although they might be even heavier in terms of "number of combined fields".

Internally caching the arbitraries definitely helps a lot. The test-group from which I derived the demo is even fast with with 1.5.0-SNAPSHOT (~6s) than with 1.3.10 (~8s).
Unfortunately my "problem childs" are still 2.5 times slower (~20s vs. ~8s).

@tmohme
Copy link
Contributor Author

tmohme commented Feb 17, 2021

A cpu flame graph of the first 60s of execution of one of my slow tests (tries=10000) with 1.5.0-SNAPSHOT:

flamegraph

I guess the column with the hover-note is where the actual test execution happens.
This in turn means 87.55% of the time jqwik is busy with "ResolvingParametersGenerator.next" (43.96% of parent) or "GenericProperty.extractParams" (43.41% of parent).

@tmohme
Copy link
Contributor Author

tmohme commented Feb 17, 2021

The same clipping for 1.3.10:

flamegraph-1 3 10

Here the relations are very different: "GenericProperty.testPredicate" is active 52.36% of parent's time while "ResolvingParametersGenerator.next" takes up 28.05% and "GenericProperty.extractParams" uses 19.38%.

@tmohme
Copy link
Contributor Author

tmohme commented Feb 17, 2021

Apart from the different relations, it is conspicuous that the call stack seems to be quite a bit deeper with 1.5.0-SNAPSHOT, which could (at least partly) be one reason for the shifted relation.

@jlink
Copy link
Collaborator

jlink commented Feb 17, 2021

@tmohme Thanks for the detailed analysis. One last question: Is execution time still growing in an overlinear way?

@tmohme
Copy link
Contributor Author

tmohme commented Feb 17, 2021

No, it seems to be linear with 1.5.0-SNAPSHOT.
My slow test (group) this afternoon:

tries duration [s]
250 5.8
500 10.8
1000 20
2000 39

edit:
The group consists of six tests.
The following table contains the absolut duration (in ms) of one try, computed as the average of tests 2..6, leaving out the the first (& slowest) test of each run to discount for cache-warming effects.

tries 1.3.10 1.5.10-SNAPSHOT
250 0.99 2.87
500 1.0 3.2
1000 0.88 2.84
2000 0.84 3.04
4000 0.99 2.87
8000 1.04 3.03

Not all tests are exactly comparable (in terms of parameters and executed logic) and the test execution order varies between runs, but nevertheless it is safe to say that in this setup 1.5.0-SNAPSHOT is slower by a factor of ~3.

@jlink
Copy link
Collaborator

jlink commented Feb 18, 2021

@tmohme You might want to try the latest 1.5.0-SNAPSHOT (make sure you refresh dependencies). Your demo runs twice as fast as with 1.3.10 here.

@tmohme
Copy link
Contributor Author

tmohme commented Feb 19, 2021

I can confirm 1.5.0-SNAPSHOT (c00d945) is way faster than the previously tested versions.
For comparison I added a column to my comment from two days ago.

The execution time now seems to grow sub-linear, converging to linear. Probably because it starts slow, followed by significantly faster executions due to cache hits.

The cpu profiling confirms that now the time spend for actually executing the test functions dominates the preparation part:

Bildschirmfoto 2021-02-19 um 09 51 20

edit:
Overall we're now back at the execution-times we had with 1.3.10.
The demo I provided might be especially sensitive to your optimizations - due to the selection or due to its small size.

@jlink
Copy link
Collaborator

jlink commented Feb 19, 2021

@tmohme I take it you'd be satisfied with the performance as it is?

@tmohme
Copy link
Contributor Author

tmohme commented Feb 19, 2021

That's right 👍

@jlink
Copy link
Collaborator

jlink commented Feb 19, 2021

Using memoization, the performance looks fine in 13e8ed7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants