New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround hash collision between literals 'null' and 'false' in DFA #6876
base: main
Are you sure you want to change the base?
Conversation
ff945db
to
d8499b3
Compare
if (LiteralValues.Count != other.LiteralValues.Count) | ||
{ | ||
return false; | ||
} | ||
|
||
var hashCode1 = new RoslynHashCode(); | ||
int nullCount1 = 0; | ||
foreach (var value1 in LiteralValues) | ||
{ | ||
hashCode1.Add(value1); | ||
if (value1 is null) | ||
nullCount1++; | ||
} | ||
|
||
var hashCode2 = new RoslynHashCode(); | ||
int nullCount2 = 0; | ||
foreach (var value2 in other.LiteralValues) | ||
{ | ||
hashCode2.Add(value2); | ||
if (value2 is null) | ||
nullCount2++; | ||
} | ||
|
||
// null and false can easily cause hash collisions and cause incorrect behaviors. | ||
// We count the nulls to workaround this collision. | ||
return nullCount1 == nullCount2 && hashCode1.ToHashCode() == hashCode2.ToHashCode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit hacky. Not sure if it's to an acceptable extent or not.
@mavasani Can you advise please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mavasani Can this implementation change completely to really compute "Equals" instead of relying on hash code? ie, rename ComputeEqualsByHashCodeParts
to ComputeEquals
and adjust the implementation accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A second thought on my fix. There could still be collisions between values:
0
andfalse
1
andtrue
So I think we either need proper equality or include the type in hash code calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that collisions are still possible. Here is a test:
var code = """
public class Test
{
public static void Method(string parameter)
{
var flag = GetFlag();
if (flag is false || flag is 0)
{
if (flag is 0)
{
}
}
}
public static object GetFlag() => null;
}
""";
await VerifyCS.VerifyCodeFixAsync(code, code);
We need to either compute equals properly without hash code, or include the Type in our hash code calculation (but not sure if including Type
will have a negative performance impact)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the collection containing the values could be modified to store both the common type of all values, and the set of the values themselves. If the common type is null, then the type of each element needs to be included in the hashing.
- A collection with no elements would have no common type
- A collection with one element would have the common type be the concrete type of the first element
- Whenever a new element is added to the collection, it either has the same type as the existing common type, or the common type of the union is set to null
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6876 +/- ##
=======================================
Coverage 96.39% 96.39%
=======================================
Files 1403 1403
Lines 330977 331018 +41
Branches 10890 10895 +5
=======================================
+ Hits 319055 319098 +43
+ Misses 9188 9186 -2
Partials 2734 2734 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marked request changes pending further review.
@@ -154,7 +154,32 @@ protected override void ComputeHashCodeParts(ref RoslynHashCode hashCode) | |||
protected override bool ComputeEqualsByHashCodeParts(CacheBasedEquatable<ValueContentAbstractValue> obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Can you review the complexity of changing the internal hash code of null
to a non-zero constant? I see two potential ways to handle this:
- Make a
NullSentinel
type with a singleton instance that returns a known hash code, and use that to represent null - Change the way RoslynHashCode computes the hash code for null instances to default to a constant that is not zero
Option (2) would be simpler, but relies on all calls to GetHashCode()
(and all uses of EqualityComparer<T>
) to have the same understanding of the hash code for null. Option (1) is more complex, but forces the correct result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the options are viable as I found it's easy to still get collisions even when null isn't involved. See #6876 (comment)
@@ -154,7 +154,32 @@ protected override void ComputeHashCodeParts(ref RoslynHashCode hashCode) | |||
protected override bool ComputeEqualsByHashCodeParts(CacheBasedEquatable<ValueContentAbstractValue> obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Can you review the complexity of changing the ComputeEqualByHashCodeParts
methods to consider both HasValue
and Value
when the input is a Nullable<T>
instance? This will force (bool?)null
to be different from (bool?)false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sharwell This is not viable as I found it's easy to still get collisions even when null isn't involved. See #6876 (comment)
Fixes #6868