Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It generates the same string being called in a sequence #26

Open
vasily-kirichenko opened this issue Dec 17, 2017 · 11 comments
Open

It generates the same string being called in a sequence #26

vasily-kirichenko opened this issue Dec 17, 2017 · 11 comments
Labels

Comments

@vasily-kirichenko
Copy link

vasily-kirichenko commented Dec 17, 2017

for _ in 1..10 do
        (Xeger @"^\[\<([A-Z][a-zA-Z0-9]*)*\>\]$").Generate() |> printfn "%s"

run 1

[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]

run 2

[<V5N42Eu>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]

run 3

[<Fl>]
[<>]
[<>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
@moodmosaic
Copy link
Owner

This looks like a bug in Xeger.cs which has been ported from here. It's been a long-long time since I did all this, and I currently don't have Java installed on my local to compare with the Java version.

I'd be curious to see if the problem persists if one could port the new Xeger stuff from this repo instead.

@moodmosaic
Copy link
Owner

Also, I can't remember if Xeger guarantees that the generated strings should be unique, but from what I see in the source code, the issue might be here, and I'd start by comparing it with the this one.

(These are just notes for me, so I can remember where to start looking at. I'm not asking you to do the work―pull requests are always welcome, though.)

@moodmosaic
Copy link
Owner

For one, this should be the only available constructor for Xeger―this one should be deprecated as it always creates a new Random instance.

@moodmosaic
Copy link
Owner

Based on what discussed in #27, I'm not 100% sure there's much we can do to generate totally different results. At least now we can generate some more distinct elements.

I think it boils down to #28, and perhaps we can close this one, but I'll let you decide. 🥇

@vasily-kirichenko
Copy link
Author

Thanks you, it's ok for now.

@MagnusMikkelsen
Copy link
Contributor

This issue remains in .NET framework. The underlying cause for this is the behaviour of new Random(). In .Net framework it uses tick count - which remains the same in 1/10000 ms.

Because of this, if you rapidly do new Xeger() you might get the same underlying random sequence, and in turns the same random strings.

new Random() in .Net Core internally generates a unique seed, and therefore we don't see the issue in .Net Core.

I've created an issue in the Autofixture Repo: AutoFixture/AutoFixture#1183

@moodmosaic
Copy link
Owner

Perhaps we can fix this in .NET Framework by specifying the seed value.

@MagnusMikkelsen
Copy link
Contributor

Perhaps we can fix this in .NET Framework by specifying the seed value.

Maybe by shamelessly using the implementation in System.Random in .net core?

The implementation can be seen here: Random.cs

... and the PR with the improvement and considerations here: dotnet/coreclr#2192

In short it uses an internal Random object to generate seed. The seed for this internal Random object comes from Interop.GetRandomBytes. In the original PR it was generated by generating a Guid.

Would this implementation be overkill? It even ensures that different appdomains do not get the same sequences.

@moodmosaic
Copy link
Owner

Thank you for investigating 👍 💯

Random.cs is licensed under MIT license, so I think it should be OK to copy some parts from that file and use them here.

Options

→ One option would be to change new Random() with new Random(GenerateGlobalSeed()) where GenerateGlobalSeed is defined as

private static unsafe int GenerateGlobalSeed()
{
    int result;
    Interop.GetRandomBytes((byte*)&result, sizeof(int));
    return result;
}

(Note that GetRandomBytes uses the RNGCryptoServiceProvider internally.)

→ Another option would be to do it exactly as they're doing it, having new Random(GenerateSeed()) where GenerateSeed is defined here. This one looks like the thread-safe version.

(Even if making the seed creation thread-safe, I'm not entirely sure that all the other Xeger/Fare parts are thread-safe...)


Feel free to pick any option you want though. Happy to look into your PR 🚀

@MagnusMikkelsen
Copy link
Contributor

Feel free to pick any option you want though. Happy to look into your PR 🚀

Will do. I've started off with the failing test, and will push an implementation to my fork later.

moodmosaic added a commit that referenced this issue Jun 30, 2020
Document workaround for issue #26 It generates the same string being called in a sequence
@moodmosaic
Copy link
Owner

One improvement is to use the following (already released) overload:

public Xeger(string regex, Random random)

It's then up to the caller to supply a Random instance, but even a default (shared) instance should be good enough for the test provided by @vasily-kirichenko, and also for the AutoFixture internals in AutoFixture/AutoFixture#1183.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants