-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add rng.gen_range
support.
#519
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like the feature and think it'll be a useful addition. I think there may be a few gotchas with the current approach - we could either guard against these (e.g. check scale after rescale and error) or change to an alternative approach.
|
||
macro_rules! dec { | ||
($e:expr) => { | ||
Decimal::from_str_exact(stringify!($e)).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI; while in tests you should be able to use the dec
macro directly (see https://github.com/paupino/rust-decimal/blob/master/Cargo.toml#L46).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that and it didn't work. I think it only works for doctests and benchmarks and the like (where the rust_decimal_macros
crate is external). I'll try again just in case, because I agree this is goofy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried again.
error: failed to resolve: could not find `rust_decimal` in the list of imported crates
I think it works in doctests and the like but not unit tests. :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like use crate as rust_decimal;
could be possible if rust_decimal_macros
didn't expand to absolute paths like ::rust_decimal
.
On a second thought, I am not sure if that would also be possible because rust_decimal_macros
imports rust_decimal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, and therein lies the problem.
assert!(random == dec!(1.00) || random == dec!(1.01)); | ||
values.insert(random); | ||
} | ||
// Somewhat flaky, will fail 1 out of every 2^255 times this is run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this? Is it because it could only generate just 1.00
or 1.01
255 times in a row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It feels like a smell. Thoughts to improve?
src/rand.rs
Outdated
|
||
// Set scales to match one another, because we are relying on mantissas' | ||
// being comparable in order outsource the actual sampling implementation. | ||
low.rescale(low.scale().max(high.scale())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works perhaps for the common cases, however just to note that it could cause some weird behavior if using one high scale and one large number but low scale.
e.g.
1.000_000_000_000_000_000_01
and 100_000_000_000_000_000_001
. In this case, the high would try to rescale however will be unable to. At the moment, the function will not fail and instead return the number scaled to the highest possible precision that it can.
This would of course mean that the mantissa's will be different scales and potentially cause odd behavior (i.e. it'd try to generate random numbers only within the high scale). Consequently, this would not give the range one might expect.
This is a tricky one to circumvent with the current method. One feasible approach is to leverage trunc
and/or frac
to generate the two values and then squeeze them in. This also has a few complexities attached with it and won't give a clean 1.00
/ 1.01
solution as per below.
Another approach is to instead scale it down to the lowest scale and go that route, however you would also need to be careful since scaling down will round.
I don't have a great answer, however it is something to think about. If this instead returned a Result
you could potentially handle the "mismatch" scale scenario as an error, however I'm not sure if this is simply a trait implementation or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the high would try to rescale however will be unable to. At the moment, the function will not fail and instead return the number scaled to the highest possible precision that it can.
Thanks, this is a great point that I had not considered.
I assert the goals should be as follows:
- In the common case, the user should get expected behavior, which I assert is "the higher scale".
- Evidence for my intuition on keeping higher scale, if one wanted a random sample of
dec!(1.0)..=dec!(3.14159)
, it would be expected that values between3.0
and3.14159
could actually be yielded. - Worse, on the other end:
dec!(1.25)..=dec(3)
could yield the value1
, which is obviously wrong. - Also, if the user wants the lower scale, rounding your more precise value seems to be more intuitive than rescaling your less precise value.
- Evidence for my intuition on keeping higher scale, if one wanted a random sample of
- Similarly, precision should not be lost wherever possible.
Your point is still valid though; we'll get wild, undefined behavior in the edge case you point out. Here's how I propose we solve it.
- Rescale both values to the higher value (as we do now).
- Check and see if the scales are equal; if they are not, rescale to the lower of the two scale values. In other words, we lose precision in this case, but only this case.
This strikes me as a minor "breach of contract" from the goals described above, but it seems to be the best middle ground. It's basically graceful degradation (I also assert that the appetite for this particular use case probably does not exist, although I agree it is right to consider it and try to degrade gracefully. :-))
The code for this is trivial, it just requires adding (immediately after the logic under discussion):
if low.scale() != high.scale() {
low.rescale(low.scale().min(high.scale());
high.rescale(low.scale().min(high.scale());
}
What do you think?
Another approach is to instead scale it down to the lowest scale and go that route, however you would also need to be careful since scaling down will round.
I'd like to avoid doing this as the common case. I could see it being acceptable in the edge case, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to check that the scales are equal and just straight up panic if they aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (using the graceful degradation approach) in 5bbe252.
src/rand.rs
Outdated
B1: SampleBorrow<Self::X> + Sized, | ||
B2: SampleBorrow<Self::X> + Sized, | ||
{ | ||
let high = *high.borrow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you not just subtract Decimal::ONE
? It should be faster since it'll take an optimal path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the only thing to be careful of is if high
is Decimal::MIN
. You could perhaps avoid it though with a checked_sub
and only use the value if successfully calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was a bit naive here - you're only removing one from the raw mantissa as opposed to the integral portion. This is an interesting challenge for a decimal number since scale is a relevant factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you determined, I can not and should not subtract Decimal::ONE
, because what I'm doing here is "subtract one from the least significant digit", as opposed to "subtract 1".
I assert that we can be confident that the value is not Decimal::MIN
because of the check that requires low < high
. There's no way for high
to be Decimal::MIN
without triggering that error.
This is an interesting challenge for a decimal number since scale is a relevant factor.
Oh, rats, the scales aren't guaranteed to be equal yet. The scale-equivalence logic probably needs to be done in new
also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 5bbe252.
This will work, however two potential issues I see are:
Overall, my main concern is that the behavior feels a little ambiguous. We could get around that with documentation, however it got me thinking about a better way. After a bit of thought, I think that the ideal approach would be to do the following:
Anyway: this allows you to do something like Of course, it is also a bit more complicated since it requires using some internals to get this level of precision - albeit, it can leverage some existing internal operations with perhaps some modification to prevent premature rounding. Consequently, I'm more than happy to either provide assistance and/or contribute code to this PR to help achieve this. Thoughts? |
I actually view this as a feature, but I might be wrong. The reason I view it as a feature is because lots of use cases for decimals involve being interested in values at a particular scale. For example, let's say that I want a random price between $1.00 and $2.00; I really do want there to be 101 options, not infinite options, and prices like $1.419386746 do not really make sense. Now, in fairness, I could take the result and just round it ( However, I take your point that when there are different scales used for min and max, that the behavior could be surprising. You're certainly right about that. The question is, is it less surprising than the alternative?
Is this the case we discussed earlier (
My personal preference is for my existing approach, because I want to be able to use a same-scale value as the low and high and receive values with that scale. I can absolutely see an argument for making the issue you describe in (2) panic, though, and I'm happy to change to that if you prefer. With that said, it's not my library! :-) What do you think? |
This is good feedback! I do see what you're saying - it'd give a much better distribution when you do want fixed scales than having to round the result. The reason I started thinking about the full range of values was more so based on how floats are implemented etc. I wonder if there is an argument to make this an explicit feature implementation... e.g. I'm curious also to any thoughts @c410-f3r might have considering they may also be using random number generation. |
The more I think about it, the more I think both worlds exist - however for now one world existing is fine so long as it is documented. Perhaps appending to the feature documentation describing the approach and limitations would suffice for now? |
Hum, I am not used to the internals of |
I can get behind this. Would you like a discrete feature gate to allow the possibility for someone to implement an alternative approach in the future? (Or we could just add a new feature gate when that happens...) |
I'm in two minds - on one hand, I like consciously opting in to behavior (i.e. have it as an explicit feature). On the other hand, it is perhaps overkill given that its new and a fairly lightweight feature that may not require other use cases. Perhaps we just document the behavior for now and feature gate it as required in the future. Sorry that such a simple feature has become such a long winded thread! |
No no, that's okay. I admire your commitment to excellence, and the feedback did improve the PR in some important ways. I'll add the documentation shortly. |
Documentation added, sorry it took so long. |
No apologies necessary! Thank you for all your help landing this feature 😄 |
Thanks for merging! |
It appears that the previous work done in #517 only added partial
rand
support. In particular, it added support forrng.gen()
but notrng.gen_range()
. The latter lets you provide a low and high value and get a random number in between them:This PR adds such support. It approaches the problem by taking the low and high values, rescaling them to match (with the higher of the two scales) if necessary, and then farming out the actual randomness by shipping off the mantissa values to
rand::distributions::uniform::UniformInt<i128>
. TheUniformInt
struct then returns the random mantissa, which is re-decimalized and returned.Scale is relevant here, the example above is distinct from
rng.gen_range(dec!(1.0)..=dec!(3.0)
, because the original example has 201 valid values (1.00, 1.01, 1.02, etc.) whereas this later one has only three potential values (1.0, 2.0, 3.0).