New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non uniform character distribution #416
Comments
First impulse: Changing the existing default behaviour would make a lot of existing properties much weaker, without people being aware of that. You're probably aware that Maybe what's missing is an |
Would a |
Regexes for string generation has been on the list for a long time: #68. I haven’t had a use case for it myself, and implementation is not trivial, so priority has been low. |
My point is not to only use ASCII, but to change the random distribution of chars so that ASCII chars are about as likely to appear as the rest. I wouldn't mind a global configuration option if that would break things for others. |
@lmartelli Did you try solving it with a provider? I think you should be able to meet your requirements with a custom |
I guess Arbitraries.frequencyOf(
Tuple.of(1, Arbitraries.strings()),
Tuple.of(3, Arbitraries.strings().ascii()
); |
That could be a solution. |
Closing since the above suggestion seems to solve the problem. |
Testing Problem
Arbitrary strings, by default, generate mostly Asian characters, because they are the most numerous, and the probably distribution for choosing a random character is even.
Suggested Solution
Given the history of character encoding on computers, I think It would be a better default emphasize the ASCII charset, so that an arbitrary string has more chances to contain ASCII chars, and you would not have to try 10K times in order to have a chance to get an arbitrary string that contains an ASCII char. Maybe the all ASCII chars could be considered an edge case of chars ?
Discussion
Discuss advantages and disadvantages of your solution. Compare it to alternative
suggestions if there are any.
The text was updated successfully, but these errors were encountered: