Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround hash collision between literals 'null' and 'false' in DFA #6876

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7763,6 +7763,31 @@ internal void Open()
}.RunAsync();
}

[Fact, WorkItem(6868, "https://github.com/dotnet/roslyn-analyzers/issues/6868")]
public async Task NullableBooleanComparisonWithNullAndFalse()
{
var code = """
public class Test
{
public static void Method(string parameter)
{
var flag = GetFlag();

if (flag == null || flag == false)
{
if (flag == false)
{
}
}
}

public static bool? GetFlag() => false;
}

""";
await VerifyCS.VerifyCodeFixAsync(code, code);
}

[Trait(Traits.DataflowAnalysis, Traits.Dataflow.PointsToAnalysis)]
[Trait(Traits.DataflowAnalysis, Traits.Dataflow.NullAnalysis)]
[Fact, WorkItem(4984, "https://github.com/dotnet/roslyn-analyzers/issues/4984")]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,32 @@ protected override void ComputeHashCodeParts(ref RoslynHashCode hashCode)
protected override bool ComputeEqualsByHashCodeParts(CacheBasedEquatable<ValueContentAbstractValue> obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Can you review the complexity of changing the internal hash code of null to a non-zero constant? I see two potential ways to handle this:

  1. Make a NullSentinel type with a singleton instance that returns a known hash code, and use that to represent null
  2. Change the way RoslynHashCode computes the hash code for null instances to default to a constant that is not zero

Option (2) would be simpler, but relies on all calls to GetHashCode() (and all uses of EqualityComparer<T>) to have the same understanding of the hash code for null. Option (1) is more complex, but forces the correct result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the options are viable as I found it's easy to still get collisions even when null isn't involved. See #6876 (comment)

Copy link
Member

@sharwell sharwell Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Can you review the complexity of changing the ComputeEqualByHashCodeParts methods to consider both HasValue and Value when the input is a Nullable<T> instance? This will force (bool?)null to be different from (bool?)false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharwell This is not viable as I found it's easy to still get collisions even when null isn't involved. See #6876 (comment)

{
var other = (ValueContentAbstractValue)obj;
return HashUtilities.Combine(LiteralValues) == HashUtilities.Combine(other.LiteralValues)
if (LiteralValues.Count != other.LiteralValues.Count)
{
return false;
}

var hashCode1 = new RoslynHashCode();
int nullCount1 = 0;
foreach (var value1 in LiteralValues)
{
hashCode1.Add(value1);
if (value1 is null)
nullCount1++;
}

var hashCode2 = new RoslynHashCode();
int nullCount2 = 0;
foreach (var value2 in other.LiteralValues)
{
hashCode2.Add(value2);
if (value2 is null)
nullCount2++;
}

// null and false can easily cause hash collisions and cause incorrect behaviors.
// We count the nulls to workaround this collision.
return nullCount1 == nullCount2 && hashCode1.ToHashCode() == hashCode2.ToHashCode()
Comment on lines +157 to +182
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit hacky. Not sure if it's to an acceptable extent or not.

@mavasani Can you advise please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mavasani Can this implementation change completely to really compute "Equals" instead of relying on hash code? ie, rename ComputeEqualsByHashCodeParts to ComputeEquals and adjust the implementation accordingly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A second thought on my fix. There could still be collisions between values:

  • 0 and false
  • 1 and true

So I think we either need proper equality or include the type in hash code calculation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that collisions are still possible. Here is a test:

            var code = """
                public class Test
                {
                	public static void Method(string parameter)
                	{
                		var flag = GetFlag();

                		if (flag is false || flag is 0)
                		{
                			if (flag is 0)
                			{
                			}
                		}
                	}

                	public static object GetFlag() => null;
                }
                
                """;
            await VerifyCS.VerifyCodeFixAsync(code, code);

We need to either compute equals properly without hash code, or include the Type in our hash code calculation (but not sure if including Type will have a negative performance impact)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the collection containing the values could be modified to store both the common type of all values, and the set of the values themselves. If the common type is null, then the type of each element needs to be included in the hashing.

  • A collection with no elements would have no common type
  • A collection with one element would have the common type be the concrete type of the first element
  • Whenever a new element is added to the collection, it either has the same type as the existing common type, or the common type of the union is set to null

&& NonLiteralState.GetHashCode() == other.NonLiteralState.GetHashCode();
}

Expand Down