Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add rng.gen_range support. #519

Merged
merged 3 commits into from
Jun 10, 2022
Merged

feat: Add rng.gen_range support. #519

merged 3 commits into from
Jun 10, 2022

Conversation

lukesneeringer
Copy link
Contributor

It appears that the previous work done in #517 only added partial rand support. In particular, it added support for rng.gen() but not rng.gen_range(). The latter lets you provide a low and high value and get a random number in between them:

let d = rng.gen_range(dec!(1.00)..=dec!(3.00));
assert!(d <= dec!(3.00));
assert!(d >= dec!(1.00));

This PR adds such support. It approaches the problem by taking the low and high values, rescaling them to match (with the higher of the two scales) if necessary, and then farming out the actual randomness by shipping off the mantissa values to rand::distributions::uniform::UniformInt<i128>. The UniformInt struct then returns the random mantissa, which is re-decimalized and returned.

Scale is relevant here, the example above is distinct from rng.gen_range(dec!(1.0)..=dec!(3.0), because the original example has 201 valid values (1.00, 1.01, 1.02, etc.) whereas this later one has only three potential values (1.0, 2.0, 3.0).

Copy link
Owner

@paupino paupino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the feature and think it'll be a useful addition. I think there may be a few gotchas with the current approach - we could either guard against these (e.g. check scale after rescale and error) or change to an alternative approach.


macro_rules! dec {
($e:expr) => {
Decimal::from_str_exact(stringify!($e)).unwrap()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI; while in tests you should be able to use the dec macro directly (see https://github.com/paupino/rust-decimal/blob/master/Cargo.toml#L46).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that and it didn't work. I think it only works for doctests and benchmarks and the like (where the rust_decimal_macros crate is external). I'll try again just in case, because I agree this is goofy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried again.

error: failed to resolve: could not find `rust_decimal` in the list of imported crates

I think it works in doctests and the like but not unit tests. :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like use crate as rust_decimal; could be possible if rust_decimal_macros didn't expand to absolute paths like ::rust_decimal.

On a second thought, I am not sure if that would also be possible because rust_decimal_macros imports rust_decimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, and therein lies the problem.

assert!(random == dec!(1.00) || random == dec!(1.01));
values.insert(random);
}
// Somewhat flaky, will fail 1 out of every 2^255 times this is run.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this? Is it because it could only generate just 1.00 or 1.01 255 times in a row?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It feels like a smell. Thoughts to improve?

src/rand.rs Outdated

// Set scales to match one another, because we are relying on mantissas'
// being comparable in order outsource the actual sampling implementation.
low.rescale(low.scale().max(high.scale()));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works perhaps for the common cases, however just to note that it could cause some weird behavior if using one high scale and one large number but low scale.

e.g.
1.000_000_000_000_000_000_01 and 100_000_000_000_000_000_001. In this case, the high would try to rescale however will be unable to. At the moment, the function will not fail and instead return the number scaled to the highest possible precision that it can.

This would of course mean that the mantissa's will be different scales and potentially cause odd behavior (i.e. it'd try to generate random numbers only within the high scale). Consequently, this would not give the range one might expect.

This is a tricky one to circumvent with the current method. One feasible approach is to leverage trunc and/or frac to generate the two values and then squeeze them in. This also has a few complexities attached with it and won't give a clean 1.00 / 1.01 solution as per below.

Another approach is to instead scale it down to the lowest scale and go that route, however you would also need to be careful since scaling down will round.

I don't have a great answer, however it is something to think about. If this instead returned a Result you could potentially handle the "mismatch" scale scenario as an error, however I'm not sure if this is simply a trait implementation or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the high would try to rescale however will be unable to. At the moment, the function will not fail and instead return the number scaled to the highest possible precision that it can.

Thanks, this is a great point that I had not considered.

I assert the goals should be as follows:

  • In the common case, the user should get expected behavior, which I assert is "the higher scale".
    • Evidence for my intuition on keeping higher scale, if one wanted a random sample of dec!(1.0)..=dec!(3.14159), it would be expected that values between 3.0 and 3.14159 could actually be yielded.
    • Worse, on the other end: dec!(1.25)..=dec(3) could yield the value 1, which is obviously wrong.
    • Also, if the user wants the lower scale, rounding your more precise value seems to be more intuitive than rescaling your less precise value.
  • Similarly, precision should not be lost wherever possible.

Your point is still valid though; we'll get wild, undefined behavior in the edge case you point out. Here's how I propose we solve it.

  1. Rescale both values to the higher value (as we do now).
  2. Check and see if the scales are equal; if they are not, rescale to the lower of the two scale values. In other words, we lose precision in this case, but only this case.

This strikes me as a minor "breach of contract" from the goals described above, but it seems to be the best middle ground. It's basically graceful degradation (I also assert that the appetite for this particular use case probably does not exist, although I agree it is right to consider it and try to degrade gracefully. :-))

The code for this is trivial, it just requires adding (immediately after the logic under discussion):

if low.scale() != high.scale() {
    low.rescale(low.scale().min(high.scale());
    high.rescale(low.scale().min(high.scale());
}

What do you think?

Another approach is to instead scale it down to the lowest scale and go that route, however you would also need to be careful since scaling down will round.

I'd like to avoid doing this as the common case. I could see it being acceptable in the edge case, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to check that the scales are equal and just straight up panic if they aren't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (using the graceful degradation approach) in 5bbe252.

src/lib.rs Show resolved Hide resolved
src/rand.rs Outdated
B1: SampleBorrow<Self::X> + Sized,
B2: SampleBorrow<Self::X> + Sized,
{
let high = *high.borrow();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you not just subtract Decimal::ONE? It should be faster since it'll take an optimal path.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only thing to be careful of is if high is Decimal::MIN. You could perhaps avoid it though with a checked_sub and only use the value if successfully calculated.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was a bit naive here - you're only removing one from the raw mantissa as opposed to the integral portion. This is an interesting challenge for a decimal number since scale is a relevant factor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you determined, I can not and should not subtract Decimal::ONE, because what I'm doing here is "subtract one from the least significant digit", as opposed to "subtract 1".

I assert that we can be confident that the value is not Decimal::MIN because of the check that requires low < high. There's no way for high to be Decimal::MIN without triggering that error.

This is an interesting challenge for a decimal number since scale is a relevant factor.

Oh, rats, the scales aren't guaranteed to be equal yet. The scale-equivalence logic probably needs to be done in new also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5bbe252.

@paupino
Copy link
Owner

paupino commented Jun 6, 2022

This will work, however two potential issues I see are:

  1. The random pool is discrete depending on the scale provided. e.g. 1 through 2 would only produce 2 values, whereby 1.0 to 2 would produce 10. The behavior is a little obscure.
  2. In some cases the discrete pool can be arbitrarily decreased due to scale rounding. In some cases by a fairly large magnitude. This makes the random pool much smaller than need be.

Overall, my main concern is that the behavior feels a little ambiguous. We could get around that with documentation, however it got me thinking about a better way. After a bit of thought, I think that the ideal approach would be to do the following:

  1. Calculate the delta between the two numbers. This needs to be done using some internals since MAX - MIN is greater than 92 bits.
  2. Scale the delta to a precision of 28. This would also require going greater than 128 bits in the worst scenario - meaning we may need to be flexible with the UniformInt usage.
  3. Calculate a random number using 0 as a minimum and the unsigned delta as the max. If the above number was an array of u32 we could comfortably just use random u32s for all, except for the high order word. In this case we'd have to use the existing value as the exclusive max.
  4. Calculate a random scale and scale down to fit it into 92 bits. Similar to the multiplication logic now.
  5. Add this to the minimum using existing addition logic. Technically, this could be done before step 4 and rounded against underflow here.

Anyway: this allows you to do something like rng.gen_range(dec!(1)..=dec!(2)) and for it to give a larger array of possibilities between these numbers.

Of course, it is also a bit more complicated since it requires using some internals to get this level of precision - albeit, it can leverage some existing internal operations with perhaps some modification to prevent premature rounding. Consequently, I'm more than happy to either provide assistance and/or contribute code to this PR to help achieve this.

Thoughts?

@lukesneeringer
Copy link
Contributor Author

The random pool is discrete depending on the scale provided. e.g. 1 through 2 would only produce 2 values, whereby 1.0 to 2 would produce 10. The behavior is a little obscure.

I actually view this as a feature, but I might be wrong.

The reason I view it as a feature is because lots of use cases for decimals involve being interested in values at a particular scale. For example, let's say that I want a random price between $1.00 and $2.00; I really do want there to be 101 options, not infinite options, and prices like $1.419386746 do not really make sense. Now, in fairness, I could take the result and just round it (answer.round_dp(2)). However, I assert that wanting a random value at scale N between a low and high value that are both scale N is a common case. But maybe my intuition is wrong there -- I might be over-anticipating simply because it's my use case. :-)

However, I take your point that when there are different scales used for min and max, that the behavior could be surprising. You're certainly right about that. The question is, is it less surprising than the alternative?

In some cases the discrete pool can be arbitrarily decreased due to scale rounding. In some cases by a fairly large magnitude. This makes the random pool much smaller than need be.

Is this the case we discussed earlier (dec!(1.000_000_001)..dec!(100_000_000.01))? You're right about this, but I wonder...will people really do that in real life? If that's a concern, perhaps this should just panic?

Thoughts?

My personal preference is for my existing approach, because I want to be able to use a same-scale value as the low and high and receive values with that scale. I can absolutely see an argument for making the issue you describe in (2) panic, though, and I'm happy to change to that if you prefer.

With that said, it's not my library! :-) What do you think?

@paupino
Copy link
Owner

paupino commented Jun 7, 2022

I actually view this as a feature, but I might be wrong.

This is good feedback! I do see what you're saying - it'd give a much better distribution when you do want fixed scales than having to round the result.

The reason I started thinking about the full range of values was more so based on how floats are implemented etc. I wonder if there is an argument to make this an explicit feature implementation... e.g. rand-discrete-range or something? In that case we could also call out the limitations as part of the feature documentation too.

I'm curious also to any thoughts @c410-f3r might have considering they may also be using random number generation.

@paupino
Copy link
Owner

paupino commented Jun 7, 2022

The more I think about it, the more I think both worlds exist - however for now one world existing is fine so long as it is documented. Perhaps appending to the feature documentation describing the approach and limitations would suffice for now?

@c410-f3r
Copy link
Contributor

c410-f3r commented Jun 7, 2022

I actually view this as a feature, but I might be wrong.

This is good feedback! I do see what you're saying - it'd give a much better distribution when you do want fixed scales than having to round the result.

The reason I started thinking about the full range of values was more so based on how floats are implemented etc. I wonder if there is an argument to make this an explicit feature implementation... e.g. rand-discrete-range or something? In that case we could also call out the limitations as part of the feature documentation too.

I'm curious also to any thoughts @c410-f3r might have considering they may also be using random number generation.

Hum, I am not used to the internals of rand to tell if this PR is on pair with the built-in UniformSampler implementations for primitives. A quick look into the unit tests tells that everything seems Okish but a deeper opinion about the applied strategy would take more time on my side.

@lukesneeringer
Copy link
Contributor Author

lukesneeringer commented Jun 7, 2022

The more I think about it, the more I think both worlds exist - however for now one world existing is fine so long as it is documented. Perhaps appending to the feature documentation describing the approach and limitations would suffice for now?

I can get behind this. Would you like a discrete feature gate to allow the possibility for someone to implement an alternative approach in the future? (Or we could just add a new feature gate when that happens...)

@paupino
Copy link
Owner

paupino commented Jun 7, 2022

The more I think about it, the more I think both worlds exist - however for now one world existing is fine so long as it is documented. Perhaps appending to the feature documentation describing the approach and limitations would suffice for now?

I can get behind this. Would you like a discrete feature gate to allow the possibility for someone to implement an alternative approach in the future? (Or we could just add a new feature gate when that happens...)

I'm in two minds - on one hand, I like consciously opting in to behavior (i.e. have it as an explicit feature). On the other hand, it is perhaps overkill given that its new and a fairly lightweight feature that may not require other use cases. Perhaps we just document the behavior for now and feature gate it as required in the future.

Sorry that such a simple feature has become such a long winded thread!

@lukesneeringer
Copy link
Contributor Author

No no, that's okay. I admire your commitment to excellence, and the feedback did improve the PR in some important ways.

I'll add the documentation shortly.

@lukesneeringer
Copy link
Contributor Author

Documentation added, sorry it took so long.

@paupino
Copy link
Owner

paupino commented Jun 10, 2022

Documentation added, sorry it took so long.

No apologies necessary! Thank you for all your help landing this feature 😄

@paupino paupino enabled auto-merge (squash) June 10, 2022 20:06
@paupino paupino merged commit 7508015 into paupino:master Jun 10, 2022
@lukesneeringer lukesneeringer deleted the rand-support-sample branch June 11, 2022 05:33
@lukesneeringer
Copy link
Contributor Author

Thanks for merging!
If it's not too much trouble, I'd appreciate a 1.24.1 release as well, so I can try to get my PR to ta-rs through. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants