Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disregard parameter names in ExpressionEqualityComparer.GetHashCode #30755

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aradalvand
Copy link

@aradalvand aradalvand commented Apr 24, 2023

Fixes #30697

Description:

The ExpressionEqualityComparer.GetHashCode method now bases the hash of each ParameterExpression on its index (position) in the parameter lists of the containing lambda(s), as opposed to its name — in addition, of course, to its type, which is true for any expression.

It's worth noting that the Equals method also already does essentially the same thing. Namely, it expects the parameter lists of the two given lambdas to be identical in number, type, and order; and then maps each parameter in lambda "a" to its counterpart with the same index/position in lambda "b" for later equality checks; meaning it's effectively doing something synonymous with what is now being done in GetHashCode:

private bool CompareLambda(LambdaExpression a, LambdaExpression b)
{
var n = a.Parameters.Count;
if (b.Parameters.Count != n)
{
return false;
}
_parameterScope ??= new Dictionary<ParameterExpression, ParameterExpression>();
for (var i = 0; i < n; i++)
{
var (p1, p2) = (a.Parameters[i], b.Parameters[i]);
if (p1.Type != p2.Type)
{
for (var j = 0; j < i; j++)
{
_parameterScope.Remove(a.Parameters[j]);
}
return false;
}
if (!_parameterScope.TryAdd(p1, p2))
{
throw new InvalidOperationException(CoreStrings.SameParameterInstanceUsedInMultipleLambdas(p1.Name));
}
}

That said, we still do fall back to the parameter name in cases where the expression passed to GetHashCode is not a full lambda (i.e. say only someLambda.Body), because we have to. This is, once again, precisely what happens in Equals as well; specifically here:

This is my very first time contributing to EF so feel free to let me know if I've made any mistakes.

A few things:

  • I noticed that (at least) in the ExpressionEqualityComparer, for loops seem to have been preferred to foreach loops even when the latter could've also been used. I'm not sure why. I went with foreach in this newly-added code because the logic of the loop is dead simple. But let me know if this somehow matters and you want a for instead.
  • I'm not sure if I should add a test for this. It doesn't seem like a substantial enough change in my estimation. But let me know if I should.
  • One small thing I noticed (which is functionally inconsequential as far as I can tell but semantically weird) is the fact that in the CompareParameter method shown below, you're doing mapped.Name == b, as opposed to just mapped == b, while the latter would work too and is what actually makes sense — should I fix this too?
    && _parameterScope.TryGetValue(a, out var mapped)
    ? mapped.Name == b.Name

  • I've read the guidelines for contributing and seen the walkthrough
  • I've posted a comment on an issue with a detailed description of how I am planning to contribute and got approval from a member of the team
  • The code builds and tests pass locally (also verified by our automated build checks)
  • Commit messages follow this format:
        Summary of the changes
        - Detail 1
        - Detail 2

        Fixes #bugnumber
  • Tests for the changes have been added (for bug fixes / features)
  • Code follows the same patterns and style as existing code in this repo

The `GetHashCode` now bases the hash code of each parameter expression
on its position in the containing lambda(s) parameter list, thereby
making it so that otherwise identical lambda expressions whose only
difference is the parameter names to yield the same hash code.

Fixes dotnet#30697
src/EFCore/Query/ExpressionEqualityComparer.cs Outdated Show resolved Hide resolved
src/EFCore/Query/ExpressionEqualityComparer.cs Outdated Show resolved Hide resolved
src/EFCore/Query/ExpressionEqualityComparer.cs Outdated Show resolved Hide resolved
src/EFCore/Query/ExpressionEqualityComparer.cs Outdated Show resolved Hide resolved
@roji
Copy link
Member

roji commented Apr 25, 2023

@aradalvand note the test failures.

@roji
Copy link
Member

roji commented Apr 25, 2023

I'm not sure if I should add a test for this. It doesn't seem like a substantial enough change in my estimation. But let me know if I should.

Yeah, a test would be good - precisely the case you gave in the issue (i.e. assert that both equality and hashcode are the same).

@roji
Copy link
Member

roji commented Apr 25, 2023

One small thing I noticed (which is functionally inconsequential as far as I can tell but semantically weird) is the fact that in the CompareParameter method shown below, you're doing mapped.Name == b, as opposed to just mapped == b, while the latter would work too and is what actually makes sense — should I fix this too?

Can you provide more details? CompareParameter currently looks like this:

private bool CompareParameter(ParameterExpression a, ParameterExpression b)
    => _parameterScope != null
        && _parameterScope.TryGetValue(a, out var mapped)
            ? mapped.Name == b.Name
            : a.Name == b.Name;

What exact problem hare you seeing here?

@roji roji self-assigned this May 25, 2023
@roji
Copy link
Member

roji commented May 25, 2023

@aradalvand any response to my comment above?

Also, you need to agree to the CLA as per the message above.

@aradalvand
Copy link
Author

aradalvand commented May 25, 2023

Not sure it's worth it. Nevermind.

@aradalvand aradalvand closed this May 25, 2023
@roji
Copy link
Member

roji commented May 26, 2023

@aradalvand it's a bit of a shame to abandon this after the effort we already put into this...

@aradalvand aradalvand reopened this May 26, 2023
@aradalvand
Copy link
Author

aradalvand commented May 26, 2023

To respond to this:
This check is basically: Is b the corresponding parameter for a?
The value coming out of TryGetValue(a, out var mapped) (that is, mapped) will be b itself, so it makes more sense to just do a simple reference equality check, as opposed to comparing the names, even though the latter works, it is nonetheless a weird thing to do, when we know mapped is going to be the same object as b.
It could just be:

private bool CompareParameter(ParameterExpression a, ParameterExpression b)
    => _parameterScope != null
        && _parameterScope.TryGetValue(a, out var mapped)
            ? mapped == b
            : a.Name == b.Name;

@aradalvand
Copy link
Author

@dotnet-policy-service agree

@roji
Copy link
Member

roji commented May 30, 2023

This check is basically: Is b the corresponding parameter for a?

In principle you are right - within the lambda, it's expected to see the same instance of ParameterExpression as the one in the lambda's parameter list, wherever that parameter is used (IIRC that's what the compiler produces, in any case). However, there's nothing technically preventing people from constructing expression trees where there are two different ParameterExpression instances with the same name. If we make your proposed change, the comparer would start to return false, leading to re-compilation and perf degradation.

One important review comment that's much more important than all the others: your PR currently introduces state on ExpressionEqualityComparer (_lambdaParameters), which has one instance (ExpressionEqualityComparer.Instance) shared concurrently by multiple threads. This is why the equality logic (as opposed to the hashing logic) is encapsulated in the ExpressionComparer struct, which is instantiated for each invocation of Equals - this allows a private instance of _parameterScope which doesn't conflict across multiple concurrent invocations. If we introduce similar state for hash code calculation, we must do the same and move the entire code into the ExpressionComparer struct.

But looking over this again... I'm simply not sure about the added value of introducing all the parameter names/indexes into the hash code; we're doing extra calculations, lookups and an instantiation (of _lambdaParameters), although there's no requirement for the hash code to contain the new information... I think the main point here is for the hash code to disregard the lambda parameter names; rather than doing that by hashing their indexes instead, we should consider simply disregarding all parameter names, regardless of whether they're lambda or not. Remember again, that there's no requirement for the hash code to be different for differing expression trees - only for it to be equal for equal ones. So disregarding all names would allow queries with different lambda parameter to be cached as the same query, with a much simpler implementation, less runtime work, and without any real disadvantage (except that query trees whose only difference is parameter names get the same hash code, and therefore get bucketized together in the cache).

How does that sound?

@aradalvand
Copy link
Author

aradalvand commented May 31, 2023

In principle you are right - within the lambda, it's expected to see the same instance of ParameterExpression as the one in the lambda's parameter list, wherever that parameter is used (IIRC that's what the compiler produces, in any case). However, there's nothing technically preventing people from constructing expression trees where there are two different ParameterExpression instances with the same name. If we make your proposed change, the comparer would start to return false, leading to re-compilation and perf degradation.

When you say "there's nothing technically preventing people from constructing expression trees where there are two different ParameterExpression instances with the same name", you mean in the same expression tree, right? Just want to make sure I understand your point correctly.
And in those cases, would those two ParameterExpression instances technically be pointing to the same parameter? I actually don't know this, I'm curious; if so, then I guess you'd be right.

But looking over this again... I'm simply not sure about the added value of introducing all the parameter names/indexes into the hash code; we're doing extra calculations

Sure, we won't do it. You explained to me here that it's not necessary, I was initially under the impression that it is. I'll update the code accordingly once I have the time to get to this again, sorry, I've just been really busy!

we should consider simply disregarding all parameter names, regardless of whether they're lambda or not

Yeah I think that makes sense, that seems to be the most reasonable approach. Given that GetHashCode isn't conclusive — which is the key point I didn't know before — there's not really any reason at all for this additional ceremony. You're spot on.

@roji
Copy link
Member

roji commented Jul 7, 2023

@aradalvand any plans on continuing work on this?

@aradalvand
Copy link
Author

aradalvand commented Jul 8, 2023

Okay, I made the changes we talked about, could you please take a look and confirm if it's all good?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ExpressionEqualityComparer.GetHashCode not consistent with ExpressionEqualityComparer.Equals
3 participants