Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix for #1183 - multiple regex within short time might give identical strings #1187

Merged
merged 5 commits into from
Jul 8, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 15 additions & 3 deletions Src/AutoFixture/RegularExpressionGenerator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ namespace AutoFixture
/// </summary>
public class RegularExpressionGenerator : ISpecimenBuilder
{
private readonly Random random;

/// <summary>
/// Initializes a new instance of the <see cref="RegularExpressionGenerator"/> class.
/// </summary>
public RegularExpressionGenerator()
{
this.random = new Random();
moodmosaic marked this conversation as resolved.
Show resolved Hide resolved
}

/// <summary>
/// Creates a string that is guaranteed to match a RegularExpressionRequest.
/// </summary>
Expand All @@ -28,16 +38,18 @@ public object Create(object request, ISpecimenContext context)
return new NoSpecimen();
}

return GenerateRegularExpression(regularExpressionRequest);
return this.GenerateRegularExpression(regularExpressionRequest);
}

private static object GenerateRegularExpression(RegularExpressionRequest request)
private object GenerateRegularExpression(RegularExpressionRequest request)
{
string pattern = request.Pattern;

try
{
string regex = new Xeger(pattern).Generate();
// Use the Xeger constructor overload that that takes an instance of Random.
// Otherwise identically strings can be generated, if regex are generated within short time.
string regex = new Xeger(pattern, this.random).Generate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my knowledge Random is not thread safe. See e.g. here. Previously we created a random instance per request (probably internally within Xeger), so we didn't have that problem.

Fixture by design can be used from multiple threads simultaneously, so we should take care of multi-threading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll look into it.
Should we use the code from https://codeblog.jonskeet.uk/2009/11/04/revisiting-randomness/ ?

Basically doing Xeger(pattern, ThreadLocalRandom.Instance)

This way we could abstract threadsafety concerns out of RegularExpressionGenerator... 🤔, and maybe reuse it elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MagnusMikkelsen you can just follow the pattern used in all the other generators, e.g.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moodmosaic @MagnusMikkelsen If there is any way to do that without locks, please prefer that way. Introducing locking makes it hard to optimize performance later. When you have a bit suite, AutoFixture already takes substantial amount of time due to uncached reflection is uses everywhere - we shouldn't degrade it even more 😟

In this particular case, it might be better to abstract out random generation and create generator, which uses e.g. Interlocked. Or, for instance, create thread-local random, but use interlocked to generate seed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interlocked.CompareExchange or (IIRC) the Interlocked-extras by Jeffrey Richter in CLR via C# can be candidates.

Though right now, I’d suggest we stick with lock as done in all other places and be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to use Interlocked.CompareExchange for this?

I did a couple of measurements, running the test below.
The times are measured using dotTrace, and are the total time and own time for RegularExpressionGenerator.GenerateRegularExpression.
The last test (ThreadLocal) took 12s. Seems to make sense, given the GenerateRegularExpression used some 85 seconds in total on 8 threads.

Total Time [ms] Own [ms]
Current: shared Random 83387 0
Original: No injected Random 88926 10
New Random each time
locking on generate seed
100843 0
Sharing ThreadLocal
locking when generating ThreadLocal
84848 27

(It is beside the point of this PR - but the Xeger.ctor seems to take most of this time - 81000ms total time. Generate is about 550ms total time. Might be an idea to cache Xeger? Fx dictionary[pattern]=Xeger)

        [Fact]
        public void CreateManyAnonymousInParallelWithRegularExpressionValidatedTypeReturnsCorrectResult()
        {
            // Arrange
            var fixture = new Fixture();
            var result = Enumerable
                .Range(0, 10000)
                .AsParallel()
                .WithDegreeOfParallelism(8)
                .Select(_ => fixture.Create<RegularExpressionValidatedType>())
                .ToArray();

            // Assert

            Assert.All(result, r => Assert.Matches(RegularExpressionValidatedType.Pattern, r.Property));
            Assert.Equal(result.Distinct(), result);
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interlocked-extras by Jeffrey Richter in CLR via C# can be candidates.

Haven't got that book - so I don't know it, do you have any links with examples? I'd like to learn new tricks 😁

if (Regex.IsMatch(regex, pattern))
{
return regex;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ namespace AutoFixtureUnitTest.DataAnnotations
{
public class RegularExpressionValidatedType
{
public const string Pattern = @"^[a-zA-Z''-'\s]{1,40}$";
public const string Pattern = @"^[a-zA-Z''-'\s]{20,40}$";

[RegularExpression(Pattern)]
public string Property { get; set; }
Expand Down
15 changes: 15 additions & 0 deletions Src/AutoFixtureUnitTest/FixtureTest.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1636,6 +1636,21 @@ public void CreateAnonymousWithRegularExpressionValidatedTypeReturnsCorrectResul
Assert.Matches(RegularExpressionValidatedType.Pattern, result.Property);
}

[Fact]
public void CreateManyAnonymousWithRegularExpressionValidatedTypeReturnsDifferentResults()
{
// This test exposes an issue with Xeger/Random.
// Xeger(pattern) internally creates an instance of Random with the default seed.
// This means that the RegularExpressionGenerator might create identical strings
// if called multiple times within a short time.

// Arrange
var fixture = new Fixture();
var result = fixture.CreateMany<RegularExpressionValidatedType>(10).Select(x => x.Property).ToArray();
// Assert
Assert.Equal(result.Distinct(), result);
}

[Fact]
public void CreateAnonymousWithStringLengthValidatedTypeReturnsCorrectResult()
{
Expand Down
24 changes: 23 additions & 1 deletion Src/AutoFixtureUnitTest/RegularExpressionGeneratorTest.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
using System.Text.RegularExpressions;
using System.Linq;
using System.Text.RegularExpressions;
using AutoFixture;
using AutoFixture.Kernel;
using AutoFixtureUnitTest.Kernel;
Expand Down Expand Up @@ -103,6 +104,27 @@ public void CreateWithRegularExpressionRequestReturnsCorrectResult(string patter
Assert.True(Regex.IsMatch(result.ToString(), pattern), string.Format("result: {0}", result));
}

[Theory]
[InlineData(@"^[A-Z]{27}$")]
public void CreateMultipleWithRegularExpressionRequestReturnsDifferentResults(string pattern)
{
// This test exposes an issue with Xeger/Random.
// Xeger(pattern) internally creates an instance of Random with the default seed.
// This means that the RegularExpressionGenerator might create identical strings
// if called multiple times within a short time.

// Arrange
var sut = new RegularExpressionGenerator();
var request = new RegularExpressionRequest(pattern);
var dummyContext = new DelegatingSpecimenContext();

// Repeat a few times to make the test more robust.
// Use ToArray to iterate the IEnumerable at this point.
var result = Enumerable.Range(0, 10).Select(_ => sut.Create(request, dummyContext)).ToArray();

Assert.Equal(result.Distinct(), result);
}

[Theory]
[InlineData("[")]
[InlineData(@"(?\[Test\]|\[Foo\]|\[Bar\])?(?:-)?(?\[[()a-zA-Z0-9_\s]+\])?(?:-)?(?\[[a-zA-Z0-9_\s]+\])?(?:-)?(?\[[a-zA-Z0-9_\s]+\])?(?:-)?(?\[[a-zA-Z0-9_\s]+\])?")]
Expand Down