Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmallRng seed is too big, and seedability #1054

Closed
SamiPerttu opened this issue Sep 29, 2020 · 12 comments
Closed

SmallRng seed is too big, and seedability #1054

SamiPerttu opened this issue Sep 29, 2020 · 12 comments

Comments

@SamiPerttu
Copy link

Hi, I noticed the new SmallRng, and while a welcome change I'm sure, I believe there's still room for improvements. I'm thinking about seedability issues specifically and in general also.

I think it's a bad idea to require, well, pretty much anything from users regarding seed distributions, especially if it's an RNG recommended for general use. Non-specialist RNGs should handle such pesky issues themselves. Users are going to pick seeds 0, 1, 2, ... To expect anything more is madness. And we should absolutely deal with that reality.

My specific suggestion at this point is to make the default seeding procedure of SmallRng idiot proof: reduce the seed size to 192 or 128 bits and hash it, like with a 64-bit seed. (The 64-bit initialization looks very good now after SplitMix64.) Because the state space is not full 256 bits, this problem can't be fixed without reducing the seed size. It depends on your perspective whether it's a real problem, but I think we have all the tools at hand to deal with seedability and improve the user experience even more in this regard.

@dhardy
Copy link
Member

dhardy commented Sep 29, 2020

This is exactly what seed_from_u64 is for.

Did you see the discussion in #1038?

@SamiPerttu
Copy link
Author

I think it's not ideal that generators are advertising a larger state space than they can provide, and then expect the user to somehow prevent bad initializations. seed_from_u64 is nice to have but not perfect. We could easily modify most generators to have safe seeds only. But I grant that if this is not felt to be a problem, then seed_from_u64 will suffice.

@vks
Copy link
Collaborator

vks commented Sep 29, 2020

I think it's not ideal that generators are advertising a larger state space than they can provide, and then expect the user to somehow prevent bad initializations.

Were is this advertised? The documentation of SeedableRng::from_seed says:

PRNG implementations are allowed to assume that bits in the seed are well distributed. That means usually that the number of one and zero bits are roughly equal, and values like 0, 1 and (size - 1) are unlikely. Note that many non-cryptographic PRNGs will show poor quality output if this is not adhered to. If you wish to seed from simple numbers, use seed_from_u64 instead.

I don't think we can completely absolve users from the possibility of choosing bad seeds. Even for a CSPRNG, a predictable seed makes the generated random numbers predictable. This is similar to choosing a weak password instead of a randomly generated one.

Almost all non-crypto PRNGS have some seeds that are problematic.

We could declare SeedableRng::from_seed as a low-level API that should not be used directly and recommend rand_seeder as the "default" for reproducibility.

@SamiPerttu
Copy link
Author

For crypto, sure, if there are security considerations then the user has to think about seeding.

But for non-crypto, I don't see how it's the user's responsibility at all to consider. How would a user even be able to create a seeding algorithm on their own? I have no idea. Isn't that something the author should provide.

Specifically for SmallRng, if the seed is 256 bits but the state space is 2**256 minus one, then you have a seeding procedure that is by definition dangerous and impossible to make safe.

To emphasize SeedableRng::from_seed as a low-level API, that sounds like a good idea. Perhaps it's simply that the scenarios of crypto and non-crypto are a bit far from each other and that is causing some inevitable friction in the interface: crypto requires safety but all other users want convenience.

@dhardy
Copy link
Member

dhardy commented Sep 30, 2020

To emphasize SeedableRng::from_seed as a low-level API, that sounds like a good idea.

For non-crypto it's not supposed to by any more than a low-level API. For higher level there's seed_from_u64, from_rng and seeder.

So I think this is mostly a doc issue?

@SamiPerttu
Copy link
Author

My specific issue was unsafe seeding procedures. That's the reason why you just replaced SmallRng, if you remember.

@vks
Copy link
Collaborator

vks commented Sep 30, 2020

How would a user even be able to create a seeding algorithm on their own?

Most of the time, users can just generate a random seed. We could enforce this by hiding the seeds behind an opaque type that can only be randomly initialized, but this breaks some use cases where access to the full seed is required.

Specifically for SmallRng, if the seed is 256 bits but the state space is 2**256 minus one, then you have a seeding procedure that is by definition dangerous and impossible to make safe.

Why is it dangerous? It is impossible to enumerate all states, and it's almost impossible that two random seeds collide.

crypto requires safety but all other users want convenience.

SeedableRng::from_seed is already inconvenient: You would have to write SmallRng::from_seed([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) to get a "bad" seed. ("Bad" only means that you will get slightly more 0 bits than expected for about 10 samples.) Compare that to SmallRng::seed_from_u64(1) or SmallRng::from_entropy(), neither of which suffer from this problem.

So I think this is mostly a doc issue?

I think we should just add a paragraph to SeedableRng::from_seed that in most cases the seed should either be randomly generated or that seed_from_u64 or rand_seeder should be preferred.

@SamiPerttu
Copy link
Author

That's fine.

There are philosophical dangers in believing you can fit more pegs than there are holes. What if some horrible villain finds the collision. If it's possible to make seeding collision free, then that would be preferable, I would think.

I misspoke when I said you made the change because of seeding, although that was also an issue.

@dhardy
Copy link
Member

dhardy commented Sep 30, 2020

I think horrible villains are not an issue for non-crypto PRNGs. In fact, basically anywhere that users enter keys, I'd recommend using a CSPRNG — they're not very expensive, provided you don't need masses in parallel (and in that case, seed your "small" PRNGs off a master CSPRNG).

@SamiPerttu
Copy link
Author

Right. Okay, so to rehash, I advocated cutting the seed to 128 bits to make sure users couldn't misuse it even in theory. There were arguments why that's not necessary, which were okay. In general, it would be best if non-crypto RNGs handled seeding themselves, but part of the problem is that that may not be a practical requirement, therefore continue to cover the bases by providing seed_from_u64 and rand_seeder as the recommended procedures and advising users to stay away from from_seed.

@dhardy
Copy link
Member

dhardy commented Oct 1, 2020

Sorry, but your arguments just don't add up. Sure, there are some "bad seeds" which should be avoided, but using a hash function doesn't necessarily avoid these, nor does ensuring the hash function is not surjective by restricting the seed size, and tweaking results to avoid these "bad seeds" is not a viable option (there are many and they're not all obvious).

What does it mean for RNGs to handle seeding themselves? From their binary seed value, they do. From a u64, they can override seed_from_u64 if needed.

@SamiPerttu
Copy link
Author

You're right. I'm not prepared to say anything more substantive because I fear getting involved in RNG politics. After discussion, I think the situation is fine as it is and we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants