Investigate replacing Ziggurat sampling with ETF #257

dhardy · 2018-02-12T11:01:34Z

See: The Exclusive Top Floor algorithm

dhardy · 2018-02-12T11:03:25Z

I should CC @sbarral and link the C++ implementation

dhardy · 2019-04-27T10:17:28Z

Also see the following 2018 paper arxiv by Morteza Jalalvand, Mohammad A. Charsooghi:

Generalized Ziggurat Algorithm for Unimodal and Unbounded Probability Density Functions with Zest

Authors' code (GPLv3): https://github.com/DiscreteLogarithm/Zest

MichaelOwenDyer · 2024-04-09T11:46:17Z

I'm looking into this. I don't speak C++ but I think my prompting skills are good enough to get the reference implementation interpreted into Rust for us. @dhardy, I believe the Ziggurat method is currently only used to sample normal and exponential distributions, no? I see that the ETF algorithm can sample arbitrarily shaped distributions, does that mean it could be used in any/all of rand's distributions? I'm trying to get a better understanding of the scope of applicability of this solution vs the existing one.

sbarral · 2024-04-09T13:50:27Z

Oh that's funny, I had already forgotten about my Rust implementation!

I see that my last work on that dates back to 4 years ago. I see it's missing a README and license, but otherwise looks fairly complete (actually more than the C++ version), with a decent battery of tests. I think my ambition was to implement more distributions before publishing it, but there was always something more important to do... Anyway, rather than let it rot for ever, I have just committed everything that wasn't, and made the repo public:

https://github.com/sbarral/etf-rs

I am very busy this week but will try to put some finishing touches in the coming month. In the meantime, the license is MIT/APACHE2, so feel free to peruse.

As for its strengths, I think ETF is great if you have untypical/custom distributions for which no literature exist. For well-known distributions, ~~chances are that some specialized sampling methods will be a bit faster (though ETF will probably be competitive)~~ [probably not true, see next comment]. I seem to recall, however, that Cauchy and Chi-Squared were at that time faster than in Rand (TBC).

Compared to Zest, I expect that ETF will always be measurably faster. Due to the way it works, Zest is likely to have a very high branch misprediction rate. ETF is in this respect similar to regular Ziggurat, with a very good prediction rate.

sbarral · 2024-04-09T14:04:16Z

I remember now that the problem is not so much that ETF might be slower than specialized sampling method; in fact it is probably often significantly faster. The issue is that for most distributions, the tables cannot be pre-computed so constructing a distribution is a costly operation.
Cauchy is good in this respect because, like the normal or exponential distributions, it can be generated for any parameter using a single table.

Bottom line: ETF is I think a very valuable algorithm, but the construction cost may not make it a good fit for Rand (and Zest will have the exact same issue).

sbarral · 2024-04-09T16:03:03Z

I just updated the Rand-related dependencies and ran the benchmark:

Test	type	Time (`rand_distr`)	Time (`etf-rs`)
central_normal	`f32`	5.0159 ns	3.8457 ns
central_normal	`f64`	3.9653 ns	3.9148 ns
normal	`f64`	6.2197 ns	4.2361 ns
cauchy	`f32`	26.118 ns	4.0772 ns
cauchy	`f64`	21.983 ns	3.8984 ns
chi_squared (k=0.5)	`f32`	29.742 ns	7.8440 ns
chi_squared (k=0.5)	`f64`	41.105 ns	10.746 ns
chi_squared (k=2)	`f32`	7.7833 ns	4.2518 ns
chi_squared (k=2)	`f64`	6.3638 ns	3.7355 ns
chi_squared (k=2)	`f32`	7.7833 ns	4.2518 ns
chi_squared (k=2)	`f64`	6.3638 ns	3.7355 ns
chi_squared (k=5)	`f32`	16.573 ns	4.2282 ns
chi_squared (k=5)	`f64`	15.904 ns	3.9329 ns
chi_squared (k=1000)	`f32`	16.355 ns	4.0622 ns
chi_squared (k=1000)	`f64`	15.495 ns	3.9862 ns

Cauchy seems to be the obvious candidate for an ETF implementation since it is about 5-6 times faster than today's rand_distr::Cauchy and it can be generated from a single pre-computed table, just like rand_distr::Normal is today (it will be in such case slightly slower than in the above benchmark, but the relative difference should be the same as rand_distr::Normal vs rand_distr::StandardNormal).

Chi-squared is much faster than rand_distr::ChiSquared, but AFAIR it cannot be generated from a single pre-computed table.

MichaelOwenDyer · 2024-04-09T16:25:29Z

Wow, that's fantastic!

Bottom line: ETF is I think a very valuable algorithm, but the construction cost may not make it a good fit for Rand (and Zest will have the exact same issue).

I guess it depends on the use-case, right? If you only sample a distribution once before destroying it then ETF might not be worth it, but the construction cost gets amortized in the long-term. I think it would be very cool if we could let users choose which sampling method to use via a generic parameter.

sbarral · 2024-04-09T17:31:33Z

the construction cost gets amortized in the long-term

Yes, this is absolutely true. The current algorithm for table construction is decently optimized, so the break-even is probably low enough to make it worthwhile in most applications (I'd hazard that break-even is of the order of ~10'000 samples). I was mainly concerned that it may not be a good default as users may not expect a constructor to be expensive.

I think it would be very cool if we could let users choose which sampling method to use via a generic parameter

This could be a great solution. I guess the new constructor would have to be changed to some other name to get the ETF version then? Another solution that could be considered is a submodule containing the alternative implementations.

dhardy · 2024-04-10T09:54:08Z

What's the compiled code size like for these? We have some pre-compiled tables for Ziggurat; adding more isn't a blocker but might be a reason to publish elsewhere.

What's the size of distribution objects? Presumably, anything not using a pre-compiled table is going to be large.

It sounds like sample performance, set-up performance, object size and possibly even code size could all be reasons for picking one implementation over another. Possibly also variance in the number of RNG samples required (if more predictable for one algorithm over another).

This could be a great solution. I guess the new constructor would have to be changed to some other name to get the ETF version then? Another solution that could be considered is a submodule containing the alternative implementations.

It does sound like there is good reason for publishing multiple variants of these distributions. We previously discussed similar questions (#494, #601), but so far WeightedIndex is the only distribution with multiple implementations and we just went with different names.

So...

In some cases (e.g. the already mentioned Cauchy), it might make sense only to replace the existing implementation? This is a value-breaking change, but we can do this occasionally.
We could aim to publish under a rand_distr::etf sub-module. This may be appropriate if several distributions have similar performance/cost tradeoffs (sample time, set-up time, size).
We could publish under rand_etf or etf instead. This may be preferable if code size is significant. It avoids needing to synchronise with rand_distr releases. (It might also prompt the question of whether the rest of rand_distr should be re-organised, but we'll leave this for now.)
We could publish under something like rand_distr::fast, maybe moving WeightedAliasIndex there too, but if I've learnt anything from the latter, its that the choice of algorithm has consequences beyond just sample/set-up performance trade-offs: it can result in API changes (bounds on generics, additional methods) and does affect reproducibility. Thus, I have a preference for reflecting the algorithm in the name.

vks · 2024-04-10T19:05:17Z

I agree that we should switch to ETF for the distributions where the table can be precomputed. For the others, it's a bit more tricky: So far, we don't offer the same distribution with different performance tradeoffs. There is not just the construction time vs. sample time tradeoff; a precision vs. time tradeoff came up as well in the past (#531, #1346). Am Di., 9. Apr. 2024 um 19:31 Uhr schrieb Serge Barral < ***@***.***>:

…

the construction cost gets amortized in the long-term Yes, this is absolutely true. The current algorithm for table construction is decently optimized, so the break-even is probably low enough to make it worthwhile in most applications (I'd hazard that break-even is of the order of ~10'000 samples). I was mainly concerned that it may not be a good default as users may not expect a constructor to be expensive. I think it would be very cool if we could let users choose which sampling method to use via a generic parameter This could be a great solution. I guess the new constructor would have to be changed to some other name to get the ETF version then? Another solution that could be considered is a submodule containing the alternative implementations. — Reply to this email directly, view it on GitHub <#257 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAIFNH6AUYVWBYN5TRYRATY4QQYXAVCNFSM4EQH7Y22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGU3TIOBQG43Q> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

sbarral · 2024-04-10T20:40:22Z

What's the compiled code size like for these? We have some pre-compiled tables for Ziggurat; adding more isn't a blocker but might be a reason to publish elsewhere.

What's the size of distribution objects? Presumably, anything not using a pre-compiled table is going to be large.

For Cauchy and Normal, the tables of the above benchmark stored 256x3=762 elements of 32 bits (for f32) or 64 bits (for f64) each.
For chi-squared, the tables had a size of 512x3.
From memory, Rand uses tables of size 256x2=512 for the normal distribution.

The size vs speed trade-off can be adjusted of course. I just ran a benchmark with tables of size 128x3=384 for Cauchy and the numbers were ~4.9ns, so about 20-25% slower than the previous ~4ns.

The size of distribution objects is basically the tables, so if they are pre-computed they are very small, and otherwise very large...

Possibly also variance in the number of RNG samples required (if more predictable for one algorithm over another).

ETF is similar to ziggurat and will need only 1 RNG per sample like 95-99% of the time, and very rarely more than 2 (but in theory, and also like ziggurat, the maximum number of samples is unbounded).

We could publish under something like rand_distr::fast, maybe moving WeightedAliasIndex there too, but if I've learnt
anything from the latter, its that the choice of algorithm has consequences beyond just sample/set-up performance
trade-offs: it can result in API changes (bounds on generics, additional methods) and does affect reproducibility. Thus, I
have a preference for reflecting the algorithm in the name.

One thing to consider indeed is that when the table is computed at construction, the constructor is fallible because the solver may fail to converge for some extreme choices of parameters.

sbarral · 2024-04-11T07:41:01Z

There is not just the construction time vs. sample time tradeoff; a precision vs. time tradeoff came up as well in the past (#531, #1346).

The linked issues are for uniform sampling, which is a bit specific. But for other distributions, I would argue that inverse transform sampling is the gold standard in the sense that it "makes the most" of a random number and, crucially, avoids clustering samples on the fine scale.

From this perspective, ETF is almost as good as it gets. It is, in fact, an inverse transform sampling method, excepts that it samples an upper bound of the PDF made of rectangles rather than the actual PDF. Its only compromise is that it uses "only" 95-99% of the entropy of the random number for sampling, the remaing part being used for the rejection step.

Here is a plot comparing the quality of ETF and ziggurat with the same source of entropy.

dhardy · 2024-04-11T08:34:23Z

Then I agree with @vks: where pre-computed tables are possible, we should likely switch to ETF.

sbarral · 2024-04-11T10:11:07Z

Then I agree with @vks: where pre-computed tables are possible, we should likely switch to ETF.

It looks like Gumbel could be another candidate beside Cauchy. I can try to implement it in my repo and will post the results here (may take a few days as I have other urgent things).

sbarral · 2024-04-13T21:35:12Z

I have added Gumbel to the repo. Here are the results of cargo bench on my machine using the same conditions as the benchmark reported earlier for other distributions.

Test	type	Time (rand_distr)	Time (etf-rs)
gumbel	f32	13.872 ns	4.6761 ns
gumbel	f64	19.719 ns	4.5018 ns

Bear in mind that here as well the tables were generated dynamically for the specific distribution parameters and their size was 256x3. Using a common table (location=0 and scale=1) necessitates an extra addition and multiplication per sample, but even with a common table and a small size (128x3), the numbers are still very decent: 5.4829 ns for f32 and 5.8653 ns for f64.

dhardy added X-enhancement Type: proposed enhancement T-distributions Topic: distributions P-low Priority: Low B-value Breakage: changes output values labels Mar 14, 2018

dhardy mentioned this issue May 30, 2018

Cauchy distribution: optimise sampling #486

Merged

dhardy mentioned this issue Apr 27, 2019

Value stability of distributions #786

Closed

dhardy mentioned this issue May 14, 2019

Generic distributions: use custom trait #793

Merged

MichaelOwenDyer mentioned this issue Apr 5, 2024

Tracker: Closing Old Issues #1432

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate replacing Ziggurat sampling with ETF #257

Investigate replacing Ziggurat sampling with ETF #257

dhardy commented Feb 12, 2018

dhardy commented Feb 12, 2018

dhardy commented Apr 27, 2019

MichaelOwenDyer commented Apr 9, 2024

sbarral commented Apr 9, 2024 •

edited

sbarral commented Apr 9, 2024 •

edited

sbarral commented Apr 9, 2024

MichaelOwenDyer commented Apr 9, 2024

sbarral commented Apr 9, 2024

dhardy commented Apr 10, 2024

vks commented Apr 10, 2024 via email

sbarral commented Apr 10, 2024 •

edited

sbarral commented Apr 11, 2024

dhardy commented Apr 11, 2024

sbarral commented Apr 11, 2024

sbarral commented Apr 13, 2024

Investigate replacing Ziggurat sampling with ETF #257

Investigate replacing Ziggurat sampling with ETF #257

Comments

dhardy commented Feb 12, 2018

dhardy commented Feb 12, 2018

dhardy commented Apr 27, 2019

MichaelOwenDyer commented Apr 9, 2024

sbarral commented Apr 9, 2024 • edited

sbarral commented Apr 9, 2024 • edited

sbarral commented Apr 9, 2024

MichaelOwenDyer commented Apr 9, 2024

sbarral commented Apr 9, 2024

dhardy commented Apr 10, 2024

vks commented Apr 10, 2024 via email

sbarral commented Apr 10, 2024 • edited

sbarral commented Apr 11, 2024

dhardy commented Apr 11, 2024

sbarral commented Apr 11, 2024

sbarral commented Apr 13, 2024

sbarral commented Apr 9, 2024 •

edited

sbarral commented Apr 9, 2024 •

edited

sbarral commented Apr 10, 2024 •

edited