feat: Add `rng.gen_range` support. #519

lukesneeringer · 2022-06-01T17:23:46Z

It appears that the previous work done in #517 only added partial rand support. In particular, it added support for rng.gen() but not rng.gen_range(). The latter lets you provide a low and high value and get a random number in between them:

let d = rng.gen_range(dec!(1.00)..=dec!(3.00));
assert!(d <= dec!(3.00));
assert!(d >= dec!(1.00));

This PR adds such support. It approaches the problem by taking the low and high values, rescaling them to match (with the higher of the two scales) if necessary, and then farming out the actual randomness by shipping off the mantissa values to rand::distributions::uniform::UniformInt<i128>. The UniformInt struct then returns the random mantissa, which is re-decimalized and returned.

Scale is relevant here, the example above is distinct from rng.gen_range(dec!(1.0)..=dec!(3.0), because the original example has 201 valid values (1.00, 1.01, 1.02, etc.) whereas this later one has only three potential values (1.0, 2.0, 3.0).

paupino

Overall, I like the feature and think it'll be a useful addition. I think there may be a few gotchas with the current approach - we could either guard against these (e.g. check scale after rescale and error) or change to an alternative approach.

paupino · 2022-06-02T01:49:30Z

src/rand.rs

+
+    macro_rules! dec {
+        ($e:expr) => {
+            Decimal::from_str_exact(stringify!($e)).unwrap()


Just FYI; while in tests you should be able to use the dec macro directly (see https://github.com/paupino/rust-decimal/blob/master/Cargo.toml#L46).

I tried that and it didn't work. I think it only works for doctests and benchmarks and the like (where the rust_decimal_macros crate is external). I'll try again just in case, because I agree this is goofy.

I just tried again.

error: failed to resolve: could not find `rust_decimal` in the list of imported crates

I think it works in doctests and the like but not unit tests. :-(

Perhaps something like use crate as rust_decimal; could be possible if rust_decimal_macros didn't expand to absolute paths like ::rust_decimal.

On a second thought, I am not sure if that would also be possible because rust_decimal_macros imports rust_decimal.

Yup, and therein lies the problem.

paupino · 2022-06-02T01:52:06Z

src/rand.rs

+            assert!(random == dec!(1.00) || random == dec!(1.01));
+            values.insert(random);
+        }
+        // Somewhat flaky, will fail 1 out of every 2^255 times this is run.


Why is this? Is it because it could only generate just 1.00 or 1.01 255 times in a row?

Correct. It feels like a smell. Thoughts to improve?

paupino · 2022-06-02T02:01:03Z

src/rand.rs

+
+        // Set scales to match one another, because we are relying on mantissas'
+        // being comparable in order outsource the actual sampling implementation.
+        low.rescale(low.scale().max(high.scale()));


This works perhaps for the common cases, however just to note that it could cause some weird behavior if using one high scale and one large number but low scale.

e.g.
1.000_000_000_000_000_000_01 and 100_000_000_000_000_000_001. In this case, the high would try to rescale however will be unable to. At the moment, the function will not fail and instead return the number scaled to the highest possible precision that it can.

This would of course mean that the mantissa's will be different scales and potentially cause odd behavior (i.e. it'd try to generate random numbers only within the high scale). Consequently, this would not give the range one might expect.

This is a tricky one to circumvent with the current method. One feasible approach is to leverage trunc and/or frac to generate the two values and then squeeze them in. This also has a few complexities attached with it and won't give a clean 1.00 / 1.01 solution as per below.

Another approach is to instead scale it down to the lowest scale and go that route, however you would also need to be careful since scaling down will round.

I don't have a great answer, however it is something to think about. If this instead returned a Result you could potentially handle the "mismatch" scale scenario as an error, however I'm not sure if this is simply a trait implementation or not.

In this case, the high would try to rescale however will be unable to. At the moment, the function will not fail and instead return the number scaled to the highest possible precision that it can.

Thanks, this is a great point that I had not considered.

I assert the goals should be as follows:

In the common case, the user should get expected behavior, which I assert is "the higher scale".

Evidence for my intuition on keeping higher scale, if one wanted a random sample of dec!(1.0)..=dec!(3.14159), it would be expected that values between 3.0 and 3.14159 could actually be yielded.

Worse, on the other end: dec!(1.25)..=dec(3) could yield the value 1, which is obviously wrong.

Also, if the user wants the lower scale, rounding your more precise value seems to be more intuitive than rescaling your less precise value.

Similarly, precision should not be lost wherever possible.

Your point is still valid though; we'll get wild, undefined behavior in the edge case you point out. Here's how I propose we solve it.

Rescale both values to the higher value (as we do now).

Check and see if the scales are equal; if they are not, rescale to the lower of the two scale values. In other words, we lose precision in this case, but only this case.

This strikes me as a minor "breach of contract" from the goals described above, but it seems to be the best middle ground. It's basically graceful degradation (I also assert that the appetite for this particular use case probably does not exist, although I agree it is right to consider it and try to degrade gracefully. :-))

The code for this is trivial, it just requires adding (immediately after the logic under discussion):

if low.scale() != high.scale() { low.rescale(low.scale().min(high.scale()); high.rescale(low.scale().min(high.scale()); }

What do you think?

Another approach is to instead scale it down to the lowest scale and go that route, however you would also need to be careful since scaling down will round.

I'd like to avoid doing this as the common case. I could see it being acceptable in the edge case, though.

The alternative is to check that the scales are equal and just straight up panic if they aren't.

Fixed (using the graceful degradation approach) in 5bbe252.

src/lib.rs

paupino · 2022-06-02T02:02:55Z

src/rand.rs

+        B1: SampleBorrow<Self::X> + Sized,
+        B2: SampleBorrow<Self::X> + Sized,
+    {
+        let high = *high.borrow();


Could you not just subtract Decimal::ONE? It should be faster since it'll take an optimal path.

I guess the only thing to be careful of is if high is Decimal::MIN. You could perhaps avoid it though with a checked_sub and only use the value if successfully calculated.

Actually, I was a bit naive here - you're only removing one from the raw mantissa as opposed to the integral portion. This is an interesting challenge for a decimal number since scale is a relevant factor.

As you determined, I can not and should not subtract Decimal::ONE, because what I'm doing here is "subtract one from the least significant digit", as opposed to "subtract 1".

I assert that we can be confident that the value is not Decimal::MIN because of the check that requires low < high. There's no way for high to be Decimal::MIN without triggering that error.

This is an interesting challenge for a decimal number since scale is a relevant factor.

Oh, rats, the scales aren't guaranteed to be equal yet. The scale-equivalence logic probably needs to be done in new also.

Fixed in 5bbe252.

paupino · 2022-06-06T18:27:55Z

This will work, however two potential issues I see are:

The random pool is discrete depending on the scale provided. e.g. 1 through 2 would only produce 2 values, whereby 1.0 to 2 would produce 10. The behavior is a little obscure.
In some cases the discrete pool can be arbitrarily decreased due to scale rounding. In some cases by a fairly large magnitude. This makes the random pool much smaller than need be.

Overall, my main concern is that the behavior feels a little ambiguous. We could get around that with documentation, however it got me thinking about a better way. After a bit of thought, I think that the ideal approach would be to do the following:

Calculate the delta between the two numbers. This needs to be done using some internals since MAX - MIN is greater than 92 bits.
Scale the delta to a precision of 28. This would also require going greater than 128 bits in the worst scenario - meaning we may need to be flexible with the UniformInt usage.
Calculate a random number using 0 as a minimum and the unsigned delta as the max. If the above number was an array of u32 we could comfortably just use random u32s for all, except for the high order word. In this case we'd have to use the existing value as the exclusive max.
Calculate a random scale and scale down to fit it into 92 bits. Similar to the multiplication logic now.
Add this to the minimum using existing addition logic. Technically, this could be done before step 4 and rounded against underflow here.

Anyway: this allows you to do something like rng.gen_range(dec!(1)..=dec!(2)) and for it to give a larger array of possibilities between these numbers.

Of course, it is also a bit more complicated since it requires using some internals to get this level of precision - albeit, it can leverage some existing internal operations with perhaps some modification to prevent premature rounding. Consequently, I'm more than happy to either provide assistance and/or contribute code to this PR to help achieve this.

Thoughts?

lukesneeringer · 2022-06-06T23:36:51Z

The random pool is discrete depending on the scale provided. e.g. 1 through 2 would only produce 2 values, whereby 1.0 to 2 would produce 10. The behavior is a little obscure.

I actually view this as a feature, but I might be wrong.

The reason I view it as a feature is because lots of use cases for decimals involve being interested in values at a particular scale. For example, let's say that I want a random price between $1.00 and $2.00; I really do want there to be 101 options, not infinite options, and prices like $1.419386746 do not really make sense. Now, in fairness, I could take the result and just round it (answer.round_dp(2)). However, I assert that wanting a random value at scale N between a low and high value that are both scale N is a common case. But maybe my intuition is wrong there -- I might be over-anticipating simply because it's my use case. :-)

However, I take your point that when there are different scales used for min and max, that the behavior could be surprising. You're certainly right about that. The question is, is it less surprising than the alternative?

In some cases the discrete pool can be arbitrarily decreased due to scale rounding. In some cases by a fairly large magnitude. This makes the random pool much smaller than need be.

Is this the case we discussed earlier (dec!(1.000_000_001)..dec!(100_000_000.01))? You're right about this, but I wonder...will people really do that in real life? If that's a concern, perhaps this should just panic?

Thoughts?

My personal preference is for my existing approach, because I want to be able to use a same-scale value as the low and high and receive values with that scale. I can absolutely see an argument for making the issue you describe in (2) panic, though, and I'm happy to change to that if you prefer.

With that said, it's not my library! :-) What do you think?

paupino · 2022-06-07T00:43:07Z

I actually view this as a feature, but I might be wrong.

This is good feedback! I do see what you're saying - it'd give a much better distribution when you do want fixed scales than having to round the result.

The reason I started thinking about the full range of values was more so based on how floats are implemented etc. I wonder if there is an argument to make this an explicit feature implementation... e.g. rand-discrete-range or something? In that case we could also call out the limitations as part of the feature documentation too.

I'm curious also to any thoughts @c410-f3r might have considering they may also be using random number generation.

paupino · 2022-06-07T00:47:30Z

The more I think about it, the more I think both worlds exist - however for now one world existing is fine so long as it is documented. Perhaps appending to the feature documentation describing the approach and limitations would suffice for now?

c410-f3r · 2022-06-07T13:45:56Z

I actually view this as a feature, but I might be wrong.

This is good feedback! I do see what you're saying - it'd give a much better distribution when you do want fixed scales than having to round the result.

The reason I started thinking about the full range of values was more so based on how floats are implemented etc. I wonder if there is an argument to make this an explicit feature implementation... e.g. rand-discrete-range or something? In that case we could also call out the limitations as part of the feature documentation too.

I'm curious also to any thoughts @c410-f3r might have considering they may also be using random number generation.

Hum, I am not used to the internals of rand to tell if this PR is on pair with the built-in UniformSampler implementations for primitives. A quick look into the unit tests tells that everything seems Okish but a deeper opinion about the applied strategy would take more time on my side.

lukesneeringer · 2022-06-07T17:23:28Z

The more I think about it, the more I think both worlds exist - however for now one world existing is fine so long as it is documented. Perhaps appending to the feature documentation describing the approach and limitations would suffice for now?

I can get behind this. Would you like a discrete feature gate to allow the possibility for someone to implement an alternative approach in the future? (Or we could just add a new feature gate when that happens...)

paupino · 2022-06-07T18:24:19Z

The more I think about it, the more I think both worlds exist - however for now one world existing is fine so long as it is documented. Perhaps appending to the feature documentation describing the approach and limitations would suffice for now?

I can get behind this. Would you like a discrete feature gate to allow the possibility for someone to implement an alternative approach in the future? (Or we could just add a new feature gate when that happens...)

I'm in two minds - on one hand, I like consciously opting in to behavior (i.e. have it as an explicit feature). On the other hand, it is perhaps overkill given that its new and a fairly lightweight feature that may not require other use cases. Perhaps we just document the behavior for now and feature gate it as required in the future.

Sorry that such a simple feature has become such a long winded thread!

lukesneeringer · 2022-06-07T18:32:15Z

No no, that's okay. I admire your commitment to excellence, and the feedback did improve the PR in some important ways.

I'll add the documentation shortly.

lukesneeringer · 2022-06-09T23:01:23Z

Documentation added, sorry it took so long.

paupino · 2022-06-10T20:06:26Z

Documentation added, sorry it took so long.

No apologies necessary! Thank you for all your help landing this feature 😄

lukesneeringer · 2022-06-11T05:38:37Z

Thanks for merging!
If it's not too much trouble, I'd appreciate a 1.24.1 release as well, so I can try to get my PR to ta-rs through. :-)

lukesneeringer mentioned this pull request Jun 1, 2022

feat: Support for Decimal within ta-rs. greyblake/ta-rs#59

Open

paupino reviewed Jun 2, 2022

View reviewed changes

lukesneeringer added 2 commits June 7, 2022 09:18

feat: Add rng.gen_range support.

cbf75bd

Incorporate @paupino feedback.

6946e43

docs: Add docs.

3c334d9

paupino enabled auto-merge (squash) June 10, 2022 20:06

paupino merged commit 7508015 into paupino:master Jun 10, 2022

lukesneeringer deleted the rand-support-sample branch June 11, 2022 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `rng.gen_range` support. #519

feat: Add `rng.gen_range` support. #519

lukesneeringer commented Jun 1, 2022

paupino left a comment

paupino Jun 2, 2022

lukesneeringer Jun 3, 2022

lukesneeringer Jun 3, 2022

c410-f3r Jun 7, 2022

lukesneeringer Jun 7, 2022

paupino Jun 2, 2022

lukesneeringer Jun 3, 2022

paupino Jun 2, 2022

lukesneeringer Jun 3, 2022

lukesneeringer Jun 3, 2022

lukesneeringer Jun 3, 2022

paupino Jun 2, 2022

paupino Jun 2, 2022

paupino Jun 2, 2022

lukesneeringer Jun 3, 2022 •

edited

Loading

lukesneeringer Jun 3, 2022

paupino commented Jun 6, 2022

lukesneeringer commented Jun 6, 2022

paupino commented Jun 7, 2022

paupino commented Jun 7, 2022

c410-f3r commented Jun 7, 2022

lukesneeringer commented Jun 7, 2022 •

edited

Loading

paupino commented Jun 7, 2022

lukesneeringer commented Jun 7, 2022

lukesneeringer commented Jun 9, 2022

paupino commented Jun 10, 2022

lukesneeringer commented Jun 11, 2022

feat: Add rng.gen_range support. #519

feat: Add rng.gen_range support. #519

Conversation

lukesneeringer commented Jun 1, 2022

paupino left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukesneeringer Jun 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paupino commented Jun 6, 2022

lukesneeringer commented Jun 6, 2022

paupino commented Jun 7, 2022

paupino commented Jun 7, 2022

c410-f3r commented Jun 7, 2022

lukesneeringer commented Jun 7, 2022 • edited Loading

paupino commented Jun 7, 2022

lukesneeringer commented Jun 7, 2022

lukesneeringer commented Jun 9, 2022

paupino commented Jun 10, 2022

lukesneeringer commented Jun 11, 2022

feat: Add `rng.gen_range` support. #519

feat: Add `rng.gen_range` support. #519

lukesneeringer Jun 3, 2022 •

edited

Loading

lukesneeringer commented Jun 7, 2022 •

edited

Loading