Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HighPrecision01 distribution #372

Closed
wants to merge 2 commits into from

Conversation

pitdicker
Copy link
Contributor

Re-opening #320.

pitdicker and others added 2 commits April 4, 2018 20:07
@dhardy
Copy link
Member

dhardy commented Apr 16, 2018

So if I recall correctly we want to try implementing HighPrecisionUniform (or some shorter name) based on this idea plus @pitdicker's previous split-around-zero trick to produce high-precision floats over arbitrary ranges.

This is thus a good addition but doesn't have to be done before 0.5.

@dhardy dhardy added X-enhancement Type: proposed enhancement F-new-int Functionality: new, within Rand T-distributions Topic: distributions P-medium Priority: Medium labels Apr 16, 2018
@vks
Copy link
Collaborator

vks commented Apr 23, 2018

Instead of testing the average, I think it makes more sense to fill a histogram. With a sufficient number of samples, it should be possible to distinguish the biased Standard from the unbiased HighPrecision01 by looking at the lowest bins. However, it might take too many sample to be feasible as a test. In any case, even with lower samples a histogram would be preferable to just calculating the average.

I think it might make sense to expose LowPrecison01 in case we ever want to change Standard. Maybe Biased01 and Unbiased01 are better names?

@vks
Copy link
Collaborator

vks commented Apr 23, 2018

One caveat (from http://xoroshiro.di.unimi.it/) that should probably be addressed in the documentation:

Interestingly, these are not the only notions of “uniformity” you can come up with. Another possibility is that of generating 1074-bit integers, normalize and return the nearest value representable as a 64-bit double (this is the theory—in practice, you will almost never use more than two integers per double as the remaining bits would not be representable). This approach guarantees that all representable doubles could be in principle generated, albeit not every returned double will appear with the same probability. A reference implementation can be found here. Note that unless your generator has at least 1074 bits of state and suitable equidistribution properties, the code above will not do what you expect (e.g., it might never return zero).

@dhardy dhardy mentioned this pull request May 24, 2018
@sicking
Copy link
Contributor

sicking commented Jun 4, 2018

Is there still interest in this?

I wrote a generator a while back for values in the [0, 1) range using maximum precision here.

What the algorithm effectively does is that it picks a random, but perfectly unbiased, point on the continuous line between 0 and 1. It then rounds that down to the nearest point that can be represented as a f32/f64.

This means that all values in the [0, 1) range that can be represented by an f32/f64 can be returned by the algorithm. However not all values are equally likely since f32/f64 has many more values close to 0 than close to 1, and so likelyhood of rounding to a particular value close to 0 is smaller, than a particular value close to 1.

Happy to adapt this to rand if there's interest?

There's also code in the same file which uses the same approach to generate values between two arbitrary, finite, f32/f64. I.e. it picks an random unbiased point on the continuous line between the start and end and then rounds to the closest f32/f64 below the picked point. However it relies on the num_bigint crate so would require more work to port.

@sicking
Copy link
Contributor

sicking commented Jun 4, 2018

I should also mention that the code for high-precision sampling for an arbitrary range is quite slow. I mainly wrote it for funsies to see what it'd look like.

However the code for high-precision sampling in [0, 1) has more reasonable performance. Though obviously slower than the Standard implementation.

@dhardy
Copy link
Member

dhardy commented Jun 5, 2018

IIRC this implementation works fine and has reasonable performance, but we were considering going with arbitrary ranges. On the other hand, if that's not easy to do it might not be a good option.

We were planning on adding distributions::HighPrecision which is just the full-precision equivalent of Uniform.

@sicking
Copy link
Contributor

sicking commented Jun 5, 2018

Cool, the existing implementation here does seem faster than the one I wrote, so I think we should go with this one.

Happy to port the arbitrary-range full-precision implementation that I wrote if there's a interest? The current code is here.

Would there be any perf goals in mind?

@sicking
Copy link
Contributor

sicking commented Jun 12, 2018

FWIW, i benchmarked my full-precision-arbitrary-range implementation and it's about 50-100x slower than the low-precision version, which is similar to what's in Uniform<f32/64>.

It's not been perf optimized much, so I'm sure that can be lowered some. But it's pretty darn slow.

@sicking
Copy link
Contributor

sicking commented Jun 21, 2018

I'm working on a more performant implementation here. Still doesn't work, so can't get perf numbers yet.

@sicking
Copy link
Contributor

sicking commented Jun 22, 2018

The implementation over here is now working and benchmarked. It's looking about 6-9x slower than the current Uniform implementation which seems viable? I'm sure some more performance can be squeezed out as well.

@dhardy
Copy link
Member

dhardy commented Jun 23, 2018

Good work. I think with that performance it is probably worth including this somehow, though obviously not as the default option. I guess we may also want a different implementation for high-precision Standard?

I would say open a PR, but it probably makes sense to resolve #494 first.

@sicking sicking mentioned this pull request Jun 27, 2018
@pitdicker
Copy link
Contributor Author

Closing in favor of #531

@pitdicker pitdicker closed this Jul 18, 2018
@dhardy
Copy link
Member

dhardy commented Jul 18, 2018

Why? As I understand this has better performance, but only works over [0, 1), so they seem to be mutually exclusive.

@dhardy dhardy reopened this Jul 18, 2018
@sicking
Copy link
Contributor

sicking commented Jul 19, 2018

The PR in #531 includes the commits here, but updated to compile on master. #531 contains both HighPrecision01 and HighPrecision distributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-new-int Functionality: new, within Rand P-medium Priority: Medium T-distributions Topic: distributions X-enhancement Type: proposed enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants