Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To provide (or not) specific PRNGs? #58

Open
arthurprs opened this issue Nov 23, 2017 · 11 comments
Open

To provide (or not) specific PRNGs? #58

arthurprs opened this issue Nov 23, 2017 · 11 comments

Comments

@arthurprs
Copy link

arthurprs commented Nov 23, 2017

On rust-lang/rfcs#2106 (comment) I argued that rand should not provide any specific/named rngs (so no ChaChaRng, XoroShiro, etc.) but instead provide a very small set of generators (OsRng, StdRng, etc.) in addition to the trait machinery, etc..

Consider just WeakRng for example (StdRng might be enough moving forward), there were almost half a dozen proposals to improve it. One can argue that the initial choice wasn't the best but it's bound to happen again no matter the upfront effort.

In the event that rand keeps providing specific Rngs, proposals with merits will eventually come and the crate will be tied to the older variant until the next major version (w/ a higher friction upgrade). Until that happens the maintainers have a few choices:

  1. Reject the PR (for reason X, even if its a win) and ask the author to push it to crates.io instead
  2. Accept the PR adding generator G to rand
  3. Accept the PR adding generator G to rand and mark the other(s) deprecated

Note that 2 and 3 are likely duplicating something from crates.io, but with a blessing.

Advantages

  • rand provide easy to reason generator(s) to satisfy the bulk of the users
  • rand provides low-friction upgrades and keeps the optimal-ish algorithms readily available
  • rand will not accumulate and/or deprecate specific algorithms throughout time
  • Specific rand Rng names won't bleed out to the world, like in forum posts, stackoverflow, etc..

Disadvantages

  • There will be no blessed specific Rng implementations
  • Major versions bumps are required, whereas providing specific implementations can get away with new Rngs and give or take a deprecation warning.

Non concerns

  • Reproducibility/_Seed_ability can still be guaranteed across major versions.
  • If the user needs a specific Rng it should reach out to a crate (crates.io raison d'être)
@dhardy
Copy link
Owner

dhardy commented Nov 23, 2017

On the whole I agree, but I think in this case it would be worth providing official companion crate(s) with named implementations (e.g. per RNG or family like rand-chacha or collections like rand-small-rng). This is mostly about quality assurance (code review and trust that updates are unlikely to cause major breakage).

@burdges
Copy link

burdges commented Nov 23, 2017

I'm old fashioned but imho if an implementation gets used, then it should be usable to avoid increasing code size. I've voiced doubts about the crate separation but maybe rand-chacha, rand-isaac, etc. could all exist in the rand repository and depend on rand-core, and rand depends on all these crates.

@arthurprs
Copy link
Author

If all backing implementations get their own blessed crate it has the advantages of both approaches with the minor drawback of having more crates to maintain (they will require little to no maintenance though).

@pitdicker
Copy link

After diving into small RNG's there turn out to be many different kinds. I only explored a couple of them, that seem to be the current 'generation'. Not many have the combination of good period, state size, statistical quality, and performance you would expect for a generally usable RNG.

For users it may be hard to pick the right one without some guidance. Especially what was common wisdom 20 years ago is far from the best choice today.

For cryptographic RNG's the situation is similar, although I have not dived all that deep into that. I have found quite a few RNG's where some people, and sometimes even the author, claim it is cryptographically secure. But they are unable to supply some sort of proof, because they don't know the tricks people working in cryptography use. And then there are cryptographic RNG's that were consider good, but are not really trusted anymore.

So it seems important to have some sort of guidance. I think one crate collecting good RNG's could provide that.

And this is not nice of me to say, but I am not all that impressed by the quality of several implementations on crates.io... That is because of all the discussion we have had here around periods, initialization, bad seeds, error handling, traits to implement (or not to implement) etc. and what I expect around documentation.

A third reason why a seperate crate that focuses on RNG's can be an advantage, is that it can explore some options to confirm/improve quality. Like:

  • (comparative) benchmarks;
  • PractRand;
  • BigCrush;
  • testing against reference implementations;
  • calculating jump polynomals for LFSR-like RNG's;
  • functions to recover an RNG's state from the output.

So I see an advantage in one crate containing multiple good RNG's, outside of rand.
How I currently see what such a crate could include:

good statistical quality, predictable for adversaries

  • PCG (one or two variants)
  • Xoroshiro with widening multiply and truncate
  • SFC (32- and 64-bit variants)

good statistical quality, unpredictable (CryptoRng)

  • HC-128
  • ChaCha20
  • maybe others, like AES-CTR DRBG, Rabbit and SOSEMANUK

And maybe other categories of interest:

entropy source

  • OsRng
  • JitterRng
  • RDRAND

huge period RNG's

  • Xorshift can be used with huge states and an output function
  • PCG with extension scheme
  • Mersenne twister?

low(er) quality but very fast or very small

  • Xorshift 32/32 (very small)
  • Xoroshiro128+ (pretty fast)
  • maybe others

rand would keep the wrappers StdRng and WeakRng to select a standard CryptoRng and small, fast RNG. But I don't like the names, especially I would like to see WeakRng be renamed to SmallRng.

Some alternative approaches, and why I think they would work less good:

Include all RNG's in rand
@arthurprs argued against this well.

Have one crate per RNG, or multiple collection crates
You still have the guidance / documentation problem.

@dhardy
Copy link
Owner

dhardy commented Dec 29, 2017

Sounds like good advice @pitdicker. It leaves two related questions:

  1. If SmallRng and CryptoRng are to be exposed in rand, should rand depend on the rand-rngs crate (or whatever it's called) or copy the code?
  2. Do users have to depend on the rand-rngs crate directly if they want a reproducible RNG (something which won't be changed in the future)? (Probably yes.)

Also, I personally think the external RNGs OsRng and JitterRng should either stay in rand, or move to another crate (e.g. rand-ext or rand-entropy) depending on rand-core only. I've already seen argument that they stay in rand unless there's really a demand for them to move.

@pitdicker
Copy link

For (1) I don't think it matters much. Even when the code is copied, with a little discipline it should not be hard to keep in sync. But if we make sure a rand-rngs crate does not use extra dependencies, I don't see a problem in depending on it. Seems cleaner to me.
For (2) I would say yes. That is the promise of the original post.

Also, I personally think the external RNGs OsRng and JitterRng should either stay in rand, or move to another crate (e.g. rand-ext or rand-entropy) depending on rand-core only.

Yes, they don't really fall under this discussion. We are not going to swap OsRng out for something else I think 😄. Okay, creating a crate with purely PRNG's seems better to me.

@pitdicker
Copy link

pitdicker commented Jan 2, 2018

@dhardy Can we set up a plan or something? cc LinearZoetrope

Currently we have 5 things that touch RNG's going on:

If we have a separate crate, it seems to me it could work both on Rand 0.3 and this repository with a feature flag. Does this order make sense?

  • make the split first
  • add better small RNG's
  • implement BlockRng
  • only add serialisation to the rand-rngs crate
  • make upstream Rand depend on it

I don't really have time do do much for the next week though.

@dhardy dhardy added the question label Jan 2, 2018
@pitdicker
Copy link

And what would be a good place for the repository? Also a sub-directory here, like rand-core?

@dhardy
Copy link
Owner

dhardy commented Jan 2, 2018

I understood BlockRng has nothing to do with small PRNGs since it's only for some CSPRNGs which make a whole block of numbers at once?

When you think we're ready I suggest opening a new PR against upstream to add a new small RNG (as something like SmallRng) and deprecate whatever will go (XorShiftRng and weak_rng I suppose). I think better not to make rand depend on your crate now (we can still discuss this later).

The small RNGs crate: I think the best strategy would be to add it inside the rand repo as something like rand-small and publish from there. Is there much point making it compatible with rand 0.3? I don't know personally. We'll probably have 0.5 soon...

rust-random#209 I was about to look at; probably we can merge now.

@LinearZoetrope's rust-random#189 can be rebased after that. If you still want to discuss which PRNGs should be serializable you can add more comments there or open an issue somewhere, but it's mostly independent (other than knowing what else may change).

@pitdicker
Copy link

I understood BlockRng has nothing to do with small PRNGs since it's only for some CSPRNGs which make a whole block of numbers at once?

Yes. But it will change serialisation of ISAAC (and if we add it ChaCha)

I think better not to make rand depend on your crate now (we can still discuss this later).

Ok. Depending on a crate or copying the code is just an implementation detail when we only expose wrappers.

The small RNGs crate: I think the best strategy would be to add it inside the rand repo as something like rand-small and publish from there.

I don't know if we are completely speaking about the same thing. The small-rngs crate I made is something I wouldn't mind to throw away after #60 was done, except that it is useful to see the code of the tested RNG's.

When splitting out RNG's as discussed here I would start fresh and only copy a few good ones from there and from rand.

Is there much point making it compatible with rand 0.3? I don't know personally. We'll probably have 0.5 soon...

If we want 0.3 to be the stable branch for a while, than I think it makes sense so support it, with something like a rand_0.3_compatibility feature. I don't think it is much work.

rust-random#209 I was about to look at; probably we can merge now.

It looks good and you put quite some effort into it. But would it not also be duplicate work? It is breaking backward compatibility (seems good and necessary), but that would break again when we only expose wrapper types.

@LinearZoetrope's rust-random#189 can be rebased after that. If you still want to discuss which PRNGs should be serializable you can add more comments there or open an issue somewhere, but it's mostly independent (other than knowing what else may change).

It would mean adding serialisation to rand, only to later remove it when it is no longer possible to use specific RNG's because only wrappers are exposed.

@dhardy
Copy link
Owner

dhardy commented Jan 2, 2018

When splitting out RNG's as discussed here I would start fresh and only copy a few good ones from there and from rand.

Okay, that's fine. (I'm not sure about including outdated classics like Mersenne Twister; possibly unnecessary since that already has its own crate.)

If we want 0.3 to be the stable branch for a while

Do we? I mean, 0.4.1 is a stable release (as much as pre-1.0 releases ever are); breaking changes must wait for 0.5 now. And where will we be when this rand-small crate is ready anyway?

It is breaking backward compatibility (seems good and necessary), but that would break again when we only expose wrapper types.

I've tried to keep breaking changes minimal; only people depending on actual values output or misusing features (e.g. Copy support) will be affected now.

It would mean adding serialisation to rand, only to later remove it when it is no longer possible to use specific RNG's because only wrappers are exposed.

True, but it gives people who need serialization an option sooner, and without too much breakage later (presumably only importing from a different repo instead, unless they decide to switch PRNG).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants