Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non uniform character distribution #416

Closed
lmartelli opened this issue Nov 25, 2022 · 8 comments
Closed

Non uniform character distribution #416

lmartelli opened this issue Nov 25, 2022 · 8 comments

Comments

@lmartelli
Copy link

Testing Problem

Arbitrary strings, by default, generate mostly Asian characters, because they are the most numerous, and the probably distribution for choosing a random character is even.

Suggested Solution

Given the history of character encoding on computers, I think It would be a better default emphasize the ASCII charset, so that an arbitrary string has more chances to contain ASCII chars, and you would not have to try 10K times in order to have a chance to get an arbitrary string that contains an ASCII char. Maybe the all ASCII chars could be considered an edge case of chars ?

Discussion

Discuss advantages and disadvantages of your solution. Compare it to alternative
suggestions if there are any.

@jlink
Copy link
Collaborator

jlink commented Nov 25, 2022

First impulse: Changing the existing default behaviour would make a lot of existing properties much weaker, without people being aware of that.

You're probably aware that StringArbitrary.ascii() configures string generators to only use those.

Maybe what's missing is an @AsciiChars constraint annotation to make it almost frictionless to start with ascii?

@adam-waldenberg
Copy link

Would a @Chars(regexp = "") or something along those lines be possible?

@jlink
Copy link
Collaborator

jlink commented Nov 26, 2022

Regexes for string generation has been on the list for a long time: #68. I haven’t had a use case for it myself, and implementation is not trivial, so priority has been low.

@lmartelli
Copy link
Author

My point is not to only use ASCII, but to change the random distribution of chars so that ASCII chars are about as likely to appear as the rest. I wouldn't mind a global configuration option if that would break things for others.

@adam-waldenberg
Copy link

adam-waldenberg commented Nov 28, 2022

@lmartelli Did you try solving it with a provider? I think you should be able to meet your requirements with a custom @Provide'r and using an arbitrary with a custom .withDistribution().

@jlink
Copy link
Collaborator

jlink commented Nov 28, 2022

@lmartelli Did you try solving it with a provider? I think you should be able to meet your requirements with a custom @Provide'r and using an arbitrary with a custom .withDistribution().

I guess Arbitraries.frequencyOf(..) is easier than muddling with distributions, e.g.

Arbitraries.frequencyOf(
 Tuple.of(1, Arbitraries.strings()),
 Tuple.of(3, Arbitraries.strings().ascii()
);

@lmartelli
Copy link
Author

That could be a solution.

@jlink
Copy link
Collaborator

jlink commented May 14, 2024

Closing since the above suggestion seems to solve the problem.
@lmartelli Feel free to re-open.

@jlink jlink closed this as completed May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants