Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of AllBeEquivalentTo #920

Conversation

krajek
Copy link
Contributor

@krajek krajek commented Sep 19, 2018

Attempt to fix #914.

There are (at least) two bottlenecks:

  • AllBeEquivalentTo created a collection of items matching the subject's length, and then used loose ordering which results in O(n^2) complexity. Change to strict ordering is necessary to have hope for any kind of reasonable performance.
  • Building collection of messages. The more inequivalent items there is the more time it takes to create assert failure messages. I think that some kind of early break is needed in EnumerableEquivalencyValidator after say 10 failures.
  • Extract OrderingRules.IsOrderingStrictFor(context) in EnumerableEquivalencyValidator.AssertElementGraphEquivalency before the loop. During profiling, it stands out as an easy-fix with the execution time around 8-10% spent in this method.

I checked that after those three issues there are no more outstanding hot spots. Execution time is split among some CanHandle methods, creating objects like EquivalencyValidationContext and so on.

Any ideas are warmly welcomed.

Improves algorithmic complexity from O(n^2) to O(n)
@krajek krajek changed the title [WIPImprove performance of AllBeEquivalentTo [WIP] Improve performance of AllBeEquivalentTo Sep 19, 2018
@jnyrup
Copy link
Member

jnyrup commented Sep 19, 2018

Out of curiosity, Did you measure the wall time improvement of this PR for the new test?

@krajek
Copy link
Contributor Author

krajek commented Sep 19, 2018

What does "wall time" mean? :-)

@jnyrup
Copy link
Member

jnyrup commented Sep 19, 2018

Wall time is just another word for the runtime measured in e.g. seconds instead of algorithmic complexity.
E.g. if applying this PR improves the test from 100s to 10s.

@krajek
Copy link
Contributor Author

krajek commented Sep 19, 2018

I tried to measure the time for 10000 items but gave up after a few minutes :-). Even for a few hundreds of items times were in the order of minutes. With strict ordering and 100 000 items, I have around 13s.

EDIT: For the record(N=500):

  • wall time 21s
  • strict ordering 0.23s

@krajek
Copy link
Contributor Author

krajek commented Sep 23, 2018

FYI: Last commit(502977c) fixed issue unrelated to the topic of the PR. When I set strict ordering for the top-level collection exception, the message for existing test case changed. The reason was the discrepancy between LooselyMatchAgainst and StrictlyMatchAgainst.
LooselyMatchAgainst created scope for each item whereas StrictlyMatchAgainst didn't.

Please review that particular scenario, I am not sure I understand the code deeply enough.

@krajek krajek changed the title [WIP] Improve performance of AllBeEquivalentTo Improve performance of AllBeEquivalentTo Sep 23, 2018
@dennisdoomen
Copy link
Member

I'm loosing track of what each PR is supposed to. Normally, if a PR consists of multiple commits, I assume you have carefully squashed individual changes into a couple of well-focused commits. We also merge these PRs by keeping all commits. So please either separate unrelated changes into separate PRs or make sure you squash commits that change stuff in previous commits into the same one. I regularly use fixup commits and/or amend existing commits.

Copy link
Member

@dennisdoomen dennisdoomen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments. See legend on what those emojis mean

@krajek
Copy link
Contributor Author

krajek commented Sep 26, 2018

First of all thanks for mentioning fixup commits, I had no idea such a thing exists.
Secondly, thanks for describing the merging policy. I am more used to no-constraints feature branches merged with squash to the main branch. I would be fine with such final for this PR as well.

Regarding the topic of this PR. Every commit but one is directly tackling performance problem as stated in the PR title. The one outstanding commit (502977c) was necessary (at least at the time of writing) because after I introduced the WithStrictOrdering usage I accidentally reproduced the #918 issue. The fix in my case what to introduce new AssertionScope for strict ordering path just like a loose ordering path was already doing.

Now we need to decide how to proceed. Possible scenarios:

A) After my explanation, you understand the scope of the feature and you merge it with squash.
B) I abandon this particular PR and split it into three, one for each "sub-task".
@dennisdoomen would you be ok with A) this time?

In either case from now on, for new PR's I will prepare my branches

  1. to be as small as possible
  2. to be prepared for a merge with all the commits using fixup/amend/rebase etc.

@dennisdoomen
Copy link
Member

would you be ok with A) this time?

Absolutely. My intention was just to explain my preference for this repo.

@dennisdoomen dennisdoomen merged commit 6c72744 into fluentassertions:master Sep 26, 2018
@krajek krajek deleted the improve_performance_of_all_be_equivalent_to branch September 26, 2018 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assert that a large enumerable contains only the very same item n times?
3 participants