Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround hash collision between literals 'null' and 'false' in DFA #6876

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Youssef1313
Copy link
Member

Fixes #6868

@Youssef1313 Youssef1313 marked this pull request as ready for review August 20, 2023 13:37
@Youssef1313 Youssef1313 requested a review from a team as a code owner August 20, 2023 13:37
Comment on lines +157 to +182
if (LiteralValues.Count != other.LiteralValues.Count)
{
return false;
}

var hashCode1 = new RoslynHashCode();
int nullCount1 = 0;
foreach (var value1 in LiteralValues)
{
hashCode1.Add(value1);
if (value1 is null)
nullCount1++;
}

var hashCode2 = new RoslynHashCode();
int nullCount2 = 0;
foreach (var value2 in other.LiteralValues)
{
hashCode2.Add(value2);
if (value2 is null)
nullCount2++;
}

// null and false can easily cause hash collisions and cause incorrect behaviors.
// We count the nulls to workaround this collision.
return nullCount1 == nullCount2 && hashCode1.ToHashCode() == hashCode2.ToHashCode()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit hacky. Not sure if it's to an acceptable extent or not.

@mavasani Can you advise please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mavasani Can this implementation change completely to really compute "Equals" instead of relying on hash code? ie, rename ComputeEqualsByHashCodeParts to ComputeEquals and adjust the implementation accordingly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A second thought on my fix. There could still be collisions between values:

  • 0 and false
  • 1 and true

So I think we either need proper equality or include the type in hash code calculation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that collisions are still possible. Here is a test:

            var code = """
                public class Test
                {
                	public static void Method(string parameter)
                	{
                		var flag = GetFlag();

                		if (flag is false || flag is 0)
                		{
                			if (flag is 0)
                			{
                			}
                		}
                	}

                	public static object GetFlag() => null;
                }
                
                """;
            await VerifyCS.VerifyCodeFixAsync(code, code);

We need to either compute equals properly without hash code, or include the Type in our hash code calculation (but not sure if including Type will have a negative performance impact)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the collection containing the values could be modified to store both the common type of all values, and the set of the values themselves. If the common type is null, then the type of each element needs to be included in the hashing.

  • A collection with no elements would have no common type
  • A collection with one element would have the common type be the concrete type of the first element
  • Whenever a new element is added to the collection, it either has the same type as the existing common type, or the common type of the union is set to null

@codecov
Copy link

codecov bot commented Aug 20, 2023

Codecov Report

Merging #6876 (d8499b3) into main (3587540) will increase coverage by 0.00%.
Report is 3 commits behind head on main.
The diff coverage is 92.85%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6876   +/-   ##
=======================================
  Coverage   96.39%   96.39%           
=======================================
  Files        1403     1403           
  Lines      330977   331018   +41     
  Branches    10890    10895    +5     
=======================================
+ Hits       319055   319098   +43     
+ Misses       9188     9186    -2     
  Partials     2734     2734           

Copy link
Member

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked request changes pending further review.

@@ -154,7 +154,32 @@ protected override void ComputeHashCodeParts(ref RoslynHashCode hashCode)
protected override bool ComputeEqualsByHashCodeParts(CacheBasedEquatable<ValueContentAbstractValue> obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Can you review the complexity of changing the internal hash code of null to a non-zero constant? I see two potential ways to handle this:

  1. Make a NullSentinel type with a singleton instance that returns a known hash code, and use that to represent null
  2. Change the way RoslynHashCode computes the hash code for null instances to default to a constant that is not zero

Option (2) would be simpler, but relies on all calls to GetHashCode() (and all uses of EqualityComparer<T>) to have the same understanding of the hash code for null. Option (1) is more complex, but forces the correct result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the options are viable as I found it's easy to still get collisions even when null isn't involved. See #6876 (comment)

@@ -154,7 +154,32 @@ protected override void ComputeHashCodeParts(ref RoslynHashCode hashCode)
protected override bool ComputeEqualsByHashCodeParts(CacheBasedEquatable<ValueContentAbstractValue> obj)
Copy link
Member

@sharwell sharwell Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Can you review the complexity of changing the ComputeEqualByHashCodeParts methods to consider both HasValue and Value when the input is a Nullable<T> instance? This will force (bool?)null to be different from (bool?)false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharwell This is not viable as I found it's easy to still get collisions even when null isn't involved. See #6876 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CA1508: false positive on sub-sequential more strict condition
2 participants