Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CA1827 (DoNotUseCountWhenAnyCanBeUsed) violations in Roslyn.sln #37959

Open
mavasani opened this issue Aug 13, 2019 · 17 comments
Open

Fix CA1827 (DoNotUseCountWhenAnyCanBeUsed) violations in Roslyn.sln #37959

mavasani opened this issue Aug 13, 2019 · 17 comments

Comments

@mavasani
Copy link
Member

Attempting to migrate Roslyn.sln to latest version (2.9.4) of FxCop analyzer packages leads to a bunch of CA1827 violations where the code is using Count() Linq invocation when Any() could be used. This CA rule was added in 2.9.4 analyzer package. A ruleset suppression would be added for this rule and this issue tracks fixing these violations and removing the ruleset entry.

@stephentoub
Copy link
Member

You'll want to be cautious with this one. In both .NET Framework and .NET Core, Enumerable.Count special-cases ICollection<T> and ICollection, such that using Count on an instance that implements those will avoid allocating, but Enumerable.Any doesn't check for such interfaces and will always end up allocating an enumerator. We could consider changing Any to do similar checks (with the pros and cons that come with that), but such a change would only impact .NET Core.

It calls into question the validity of the added rule.

@mavasani
Copy link
Member Author

Thanks @stephentoub. Tagging @paulomorgado who added this rule very recently.

@paulomorgado
Copy link

@stephentoub, I would say that Enumerable.Any should be changed. As a developer, when I receive an IEnumerable<T> from an API call, I don't know which it is. It might be coming from a LINQ query.

By the way, ImmutableArrayExtensions has an Any extension that use the internal array length. But not Count (corefx/#39007).

@stephentoub
Copy link
Member

I would say that Enumerable.Any should be changed

As I noted, that might be something to do for .NET Core. It will not be changed in .NET Framework. So the rule is problematic on .NET Framework at the very least.

There are also downsides to changing it, which is why it hasn't been done thus far, e.g. the extra interface checks add overhead, which is less impactful when the alternative is a potentially long iteration through every element, and more impactful when at most one element will be considered.

@paulomorgado
Copy link

@stephentoub, are your comments only related to Enumerable.Count<TSource>() vs. Enumerable.Any<TSource>(), or are they also related to Enumerable.Count<TSource>(Func<TSource, bool> predicate) vs. Enumerable.Any<TSource>(Func<TSource, bool> predicate)?

@stephentoub
Copy link
Member

Just the ones that don't take a predicate.

@paulomorgado
Copy link

So. this is still a valid analyzer/fixer for those that take the predicate, right?

@stephentoub
Copy link
Member

Yes

@paulomorgado
Copy link

Sorry, one more question: this also doesn't apply to Queryable.Count<TSource>(), right?

@mavasani
Copy link
Member Author

Tagging @sharwell for his views.

@sharwell
Copy link
Member

The analyzer attempts to avoid an O(n) amount of work during the switch from Count() to Any(). This is a more acute problem for general-purpose code than avoiding an O(1) allocation on a fast path. I would further argue that code sensitive enough to performance where the special-case behavior matters would be using more concrete types than IEnumerable<T>. I do not see any issues with the rule as-presented which would alter or special-case its behavior.

@paulomorgado
Copy link

And this will catch most of them using Enumerable.Count<TSource>(IEnumerable<TSource> source).

@mavasani
Copy link
Member Author

@paulomorgado - can we change the analyzer to do the following:

  1. Detect all uses of Count() on ICollection or ICollection<T> receivers and recommend using Count or Length as appropriate (your current PR seems to do this)
  2. Only recommend replacing Count() with Any() on receivers of type IEnumerable or IEnumerable<T> if it does not implement ICollection or ICollection<T>, given that the former rule will already cover this case.

This should address @stephentoub's concern while also avoiding both the above diagnostics on certain code patterns, when clearly accepting the first diagnostic/fix is preferred.

@stephentoub
Copy link
Member

FYI, I did submit dotnet/corefx#40377, however a) this will be post-.NET Core 3.0, and b) it will not change .NET Framework.

@mavasani
Copy link
Member Author

Given @stephentoub's fix and @sharwell's point about performance sensitive code using more concrete types then IEnumerable<T>, I think it should be fine to even keep @paulomorgado's analyzer as-is and ignore my suggestion above #37959 (comment). @stephentoub does that seem reasonable?

Going forward, we will involve the corefx team in the early design/triage phase for any new analyzers being added to Microsoft.NetCore.Analyzers and Microsoft.NetFramework.Analyzers

@stephentoub
Copy link
Member

It's up to you. I'm more conservative when it comes to code change recommendations, and so unless something is strictly better 99.9% of the time, I hesitate to suggest something that someone might blindly auto-fix. In this case, on .NET Framework and on .NET Core 3.0 and earlier, it is still the case that Any() will allocate in many situations where Count() won't, and it'll be impossible for the analyzer to always know the right answer, nor whether Count() will be O(N) or O(1), nor whether the cost of an allocation from Any() will be more or less impactful than iteration that might happen as part of Count(). It's fine if you want to keep the analyzer, but I suggest then that the associated text/description/docs make it clear that it's a trade-off.

@paulomorgado
Copy link

We can exclude from this analyzer/fixer the use cases covered by pull/2736 and add replacement of Any() alongside with Count() to pull/2736.

Performance wise, it should be the best compromise.

As for users, it might be a bit more confusing, but they might learn from it.

When the framework changes, we'll change the analyzers/fixers accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants