Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cauchy distribution: optimise sampling #486

Merged
merged 1 commit into from
Jun 1, 2018
Merged

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented May 30, 2018

@MaximoB this presumably uses 1 bit less precision but is quite a bit faster

# previously
test distr_cauchy                ... bench:      48,639 ns/iter (+/- 1,411) = 164 MB/s                                                 
# now
test distr_cauchy                ... bench:      36,224 ns/iter (+/- 654) = 220 MB/s

Optimisers are weird, but as I guessed, it seems we get a free subtraction operation when using Open01.

I considered checking that the result is finite with a loop, but I'm fairly sure it must be anyway since π is irrational. Do you agree?

@vks
Copy link
Collaborator

vks commented May 30, 2018

I considered checking that the result is finite with a loop, but I'm fairly sure it must be anyway since π is irrational.

I think you are right that tan produces finite results for finite input. The only thing I can think of that could go wrong is that r * pi / 2. rounds such that you end up on the other sight of the singularity, changing the sign.

@dhardy
Copy link
Member Author

dhardy commented May 30, 2018

I realised that might be possible, but frankly I don't think it would have much effect on the distribution (it's valid to sample from any π-length interval excluding the singularities). Probably though it would round towards zero anyway.

@pitdicker
Copy link
Contributor

Optimisers are weird, but as I guessed, it seems we get a free subtraction operation when using Open01.

75% plus of the time is spend in tan. LLVM can't combine both subtractions, because it doesn't know there will be no rounding error. If you use the range code, or use IntoFloat directly, this one instruction can be removed, but it doesn't matter in practice. The performance difference is because of using Open01, and removing the loop.

@vks
Copy link
Collaborator

vks commented May 30, 2018

I don't think it would have much effect on the distribution

I agree, it would just wrap around, which is not a problem because it is a random number anyway.

LLVM can't combine both subtractions, because it doesn't know there will be no rounding error.

More concretely, it cannot optimize x - (1.0 - EPSILON / 2.0) - 0.5 to x - ((1.0 - EPSILON / 2.0) - 0.5).

@MaximoB
Copy link
Contributor

MaximoB commented May 30, 2018

@dhardy
Ok, I am not sure how I got such different results in my benchmarking, but the two methods were very close on my computer so if this way is significantly faster for you then it is probably the better option.

Since you are using Open01 the only way you could hit one of the singularities in tan is if Open01 malfunctions or there is a floating point error subtracting 0.5 so I don't really think getting infinity is something to be concerned about.

Over the weekend I read some papers and I have implemented a stunningly fast numerical approximation method for generating Cauchy numbers.
test distr_cauchy ... bench: 9,945 ns/iter (+/- 2,510) = 804 MB/s
I don't really know what the error bounds on this is though so if you are more comfortable going with a more accurate but slower generation I understand (although the paper claims it is very low error).

@dhardy
Copy link
Member Author

dhardy commented May 30, 2018

The performance difference is because of using Open01, and removing the loop.

You're right; turns out removing the loop makes most of the difference. Moving everything inside the loop also performs well on my CPU, but I think we can just drop the loop since tan(π/2) rounds to a finite number anyway.

@dhardy
Copy link
Member Author

dhardy commented May 30, 2018

Amended.

@MaximoB that sounds worth looking into. I don't think this is the only distribution which could be sped up significantly using a different approach; see #257. If you're interested then please open a PR.

This is unnecesary due to FP approximation resulting in rounding away from π/2,
and significantly increases performance.
@dhardy dhardy merged commit 4fd39ba into rust-random:master Jun 1, 2018
@dhardy dhardy deleted the cauchy branch February 15, 2019 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants