Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add stats.qmc module with quasi Monte Carlo functionality #10844

Merged
merged 86 commits into from Jan 30, 2021

Conversation

tupui
Copy link
Member

@tupui tupui commented Sep 19, 2019

Add scipy.stats.qmc.

Reference issue

Closes #9695

What does this implement/fix?

Provide a set of functions to create and assess quasi-Monte Carlo Design of Experiments.

It provides a generic class scipy.stats.qmc.QMCEngine which defines a QMC engine/sampler. An engine is state aware: it can be continued, advanced and reseted. 4 base samplers are available:

  • scipy.stats.qmc.Sobol the well known Sobol' low discrepancy sequence. Several warnings have been added to guide the user into properly using this sampler. The sequence is scrambled by default.
  • scipy.stats.qmc.Halton: Halton low discrepancy sequence. The sequence is scrambled by default.
  • scipy.stats.qmc.LatinHypercube: plain LHS design.
  • scipy.stats.qmc.OrthogonalLatinHypercube: Orthogonal version of LHS which is more uniform than plain LHS.

And 2 special samplers are available:

  • scipy.stats.qmc.MultinomialQMC: sampling from a multinomial distribution using any of the base scipy.stats.qmc.QMCEngine.
  • scipy.stats.qmc.MultivariateNormalQMC: sampling from a multivariate Normal using any of the base scipy.stats.qmc.QMCEngine.

The module also provide the following helpers:

  • scipy.stats.qmc.discrepancy: assess the quality of a set of points in terms of space coverage.
  • scipy.stats.qmc.update_discrepancy: can be used in an optimization loop to construct a good set of points.
  • scipy.stats.qmc.scale: easily scale a set of points from (to) the unit interval to (from) a given range.

The module quality has been assessed by specialists from the domain (see the long discussion bellow for more details). Following are some convergence plots of Sobol' and Halton.

halton_art2
sobol_typeB

@rgommers rgommers added enhancement A new feature or improvement scipy.stats labels Sep 20, 2019
@rgommers
Copy link
Member

Note: the scope of this PR is under discussion in gh-9695.

@tupui tupui force-pushed the enh_qmc branch 5 times, most recently from e246967 to 27efb1c Compare September 24, 2019 07:59
@tupui
Copy link
Member Author

tupui commented Sep 24, 2019

@rgommers First proposal is here. I still have a doc issue but I am using the same docstring as in scipy.optimize._differentialevolution.
I did not yet updated scipy.optimize._differentialevolution to use the new scipy.stats.qmc.orthogonal_latin_hypercube but appart for possible test fail there is no blocking point here.

@rgommers rgommers changed the title ENH: Design of Experiments functions ENH: add stats.qmc module with quasi Monte Carlo functionality Oct 4, 2019
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments. Scope discussion would be good to finish first before reviewing in more detail.

scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
@tupui
Copy link
Member Author

tupui commented Jun 4, 2020

@tylerjereddy Hey I integrated your suggestions, thanks! Could you have another look?

@tupui
Copy link
Member Author

tupui commented Jun 19, 2020

@Balandat I finished to implement the changes we discussed.

  • I moved Sobol to the file qmc (as well as MultivariateNormalQMC, NormalQMC, multinomial_qmc, discrepancy_star_L2). I had to change some cdef functions to cpdef for that. I think it's clearer like this.
  • All the tests are updated.
  • MultivariateNormalQMC, NormalQMC and multinomial_qmc accept an engine parameter. It falls back to Sobol.

I like the interface and I think this can be ready soon 😃. Let me know what you think.

@Balandat
Copy link
Contributor

This looks great. FWIW I have a local commit that moves Sobol to use the new 21201-dimensional direction numbers from https://web.maths.unsw.edu.au/~fkuo/sobol/. The c-code generated from putting the numbers in as literals is obscenely large/inefficient, so what I did was to package and save the arrays as an .npz data file and load these in upon importing the Sobol generator. This works locally with fixing the path, the outstanding thing here is to make sure this is portable and the data file gets packaged with the extension and the path is resolved appropriately. Then the tests need to be modified to check fo the new direction numbers.

scipy/stats/_sobol.pyx Outdated Show resolved Hide resolved
Copy link
Contributor

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, couple of comments inline. There is a doc failure that needs to be fixed.

scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
scipy/stats/qmc.py Outdated Show resolved Hide resolved
@tupui
Copy link
Member Author

tupui commented Jun 21, 2020

Overall this looks good, couple of comments inline. There is a doc failure that needs to be fixed.

I addressed the comments and fixed the doc (to be confirmed with tests). I plan to complete the docstrings with examples as with Halton.

Speaking of the doc. Is the current doc enough or should we add a bigger section somewhere about DoE, an introduction like for optimization? Do we want figures of DoE for instance? A section in https://docs.scipy.org/doc/scipy-1.4.1/reference/tutorial/stats.html ?

I have quite some material in case we need something a bit more theoretical (I have some chapters about DoE in my PhD thesis and articles I wrote, so license-wise fine).

cc @josef-pkt (I think you're the PoC for this part of the doc)

@tupui
Copy link
Member Author

tupui commented Jun 21, 2020

There is still a docstring warning about np.random.RandomState. It's in all seed param definition. But I am using here what's in the code of scipy.optimize (for instance).

I found another block used in rv_continuous. Maybe more up to date:

"""
seed : {None, int, `~np.random.RandomState`, `~np.random.Generator`}, optional
        This parameter defines the object to use for drawing random variates.
        If `seed` is `None` the `~np.random.RandomState` singleton is used.
        If `seed` is an int, a new ``RandomState`` instance is used, seeded
        with seed.
        If `seed` is already a ``RandomState`` or ``Generator`` instance,
        then that object is used.
        Default is None.
"""

[EDIT] fixed with this docstring

@tupui tupui force-pushed the enh_qmc branch 2 times, most recently from bd0a08b to 52cae01 Compare June 22, 2020 01:56
@tupui
Copy link
Member Author

tupui commented Jun 22, 2020

@Balandat I added quite some doc trying to mimic the style of scipy. IMO, we are good for a first version and this is ready for a review for integration 😃 .

@tupui tupui force-pushed the enh_qmc branch 3 times, most recently from 3047b2a to 359369c Compare June 22, 2020 06:37
@rgommers
Copy link
Member

Did one more review pass through, resolved some comments that were addressed, and added stats.qmc to the list of API modules documented as public.

@rgommers
Copy link
Member

Merged now, thank you @tupui and @Balandat, this is a really nice addition to SciPy! And thanks also to @mdhaber and @tylerjereddy for reviewing, and @ArtOwen, @fjhickernell, @rkern, @bashtage and others for contributing their expertise. This will be a highlight of the 1.7.0 release.

@rgommers
Copy link
Member

Thanks for the patience on this one all!

@rgommers
Copy link
Member

@tupui could you help with the following:

  • Adding a description to the release notes (https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.7.0). I added it as a highlight, and added a subsection under scipy.stats improvements with TODO. A one paragraph summary would be great.
  • Update the PR description with a short summary as well. The latest version of the convergence plot could be copied into that PR description too, it's quite nice and that makes it discoverable. Also, maybe that is useful for tutorial content?

@tupui
Copy link
Member Author

tupui commented Jan 30, 2021

Merged now, thank you @tupui and @Balandat, this is a really nice addition to SciPy! And thanks also to @mdhaber and @tylerjereddy for reviewing, and @ArtOwen, @fjhickernell, @rkern, @bashtage and others for contributing their expertise. This will be a highlight of the 1.7.0 release.

Thanks everyone for all the good work and your guidance during this process! I learned a lot during the process and I am grateful for this. I would also like to acknowledge Sergei Kucherenko who's been of great help via email.

For information, @Balandat and I will present this at the 13th International Conference on Monte Carlo Methods and Applications (MCM 2021). This for sure will help to spread the word about this new module.

@tupui
Copy link
Member Author

tupui commented Jan 30, 2021

@tupui could you help with the following:

  • Adding a description to the release notes (https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.7.0). I added it as a highlight, and added a subsection under scipy.stats improvements with TODO. A one paragraph summary would be great.

  • Update the PR description with a short summary as well. The latest version of the convergence plot could be copied into that PR description too, it's quite nice and that makes it discoverable. Also, maybe that is useful for tutorial content?

Sure I will open à PR with all that! I like the idea of the tutorial. I guess this would be another section of the existing stats tutorial.

@Balandat
Copy link
Contributor

Yay! Don't think I've had >600 conversation items on a PR before, thanks eveyone for the thorough discussion!

Balandat added a commit to Balandat/botorch that referenced this pull request Jan 30, 2021
Summary: Account for the new capabilities of the pytorch SobolEngine (up to 21201 dims) from scipy/scipy#10844

Differential Revision: D25661131

fbshipit-source-id: de53b82c1a5a8d4bcba21cd72bbd1ccb0eec0164
@rgommers
Copy link
Member

Sure I will open à PR with all that! I like the idea of the tutorial. I guess this would be another section of the existing stats tutorial.

Awesome. Yes indeed. At some point we may want to break this into separate pages for subsections, given that it is already very long. But for now I would just add to it.

Yay! Don't think I've had >600 conversation items on a PR before, thanks eveyone for the thorough discussion!

I know it was a painful delivery. We only get to add whole new submodules very rarely though:)

Balandat added a commit to Balandat/pytorch that referenced this pull request Jan 30, 2021
Summary:
Performs the update that was suggested in pytorch#41489

Adjust the functionality to largely match that pf the scipy companion PR scipy/scipy#10844, including
- a new `draw_base2` method
- include zero as the first point in the (unscrambled) Sobol sequence

The scipy PR is also quite opinionated if the `draw` method doesn't get called with a base 2 number (for which the resulting sequence has nice properties, see the scipy PR for a comprehensive discussion of this).

Note that this update is a **breaking change** in the sense that sequences generated with the same parameters after as before will not be identical! They will have the same (better, arguably) distributional properties, but calling the engine with the same seed will result in different numbers in the sequence.

Pull Request resolved: pytorch#49710

Test Plan:
```
from torch.quasirandom import SobolEngine

sobol = SobolEngine(3)
sobol.draw(4)

sobol = SobolEngine(4, scramble=True)
sobol.draw(5)

sobol = SobolEngine(4, scramble=True)
sobol.draw_base2(2)
```

Reviewed By: malfet

Differential Revision: D25657233

Pulled By: Balandat

fbshipit-source-id: ce8ccafe90fc968ed811634b5f37c2eb208af985
@Balandat
Copy link
Contributor

I know it was a painful delivery. We only get to add whole new submodules very rarely though:)

Challenge accepted, next up: A full module.

@fjhickernell
Copy link

fjhickernell commented Jan 30, 2021 via email

@tupui
Copy link
Member Author

tupui commented Jan 31, 2021

@rgommers I updated the description of the issue. Added some comments in the release notes (mostly the same text). And I created the 2 follow up PR I had. I will work on the tutorial now.

@tupui tupui mentioned this pull request Jan 31, 2021
8 tasks
facebook-github-bot pushed a commit to pytorch/botorch that referenced this pull request Feb 1, 2021
Summary:
Pull Request resolved: #672

Account for the new capabilities of the pytorch SobolEngine (up to 21201 dims) from scipy/scipy#10844

Reviewed By: stevemandala

Differential Revision: D25661131

fbshipit-source-id: 9cddb3470f953dbd8eab6f12c8e13d516ad8ef37
facebook-github-bot pushed a commit to pytorch/pytorch that referenced this pull request Feb 1, 2021
Summary:
Performs the update that was suggested in #41489

Adjust the functionality to largely match that pf the scipy companion PR scipy/scipy#10844, including
- a new `draw_base2` method
- include zero as the first point in the (unscrambled) Sobol sequence

The scipy PR is also quite opinionated if the `draw` method doesn't get called with a base 2 number (for which the resulting sequence has nice properties, see the scipy PR for a comprehensive discussion of this).

Note that this update is a **breaking change** in the sense that sequences generated with the same parameters after as before will not be identical! They will have the same (better, arguably) distributional properties, but calling the engine with the same seed will result in different numbers in the sequence.

Pull Request resolved: #49710

Test Plan:
```
from torch.quasirandom import SobolEngine

sobol = SobolEngine(3)
sobol.draw(4)

sobol = SobolEngine(4, scramble=True)
sobol.draw(5)

sobol = SobolEngine(4, scramble=True)
sobol.draw_base2(2)
```

Reviewed By: malfet

Differential Revision: D25657233

Pulled By: Balandat

fbshipit-source-id: 9df50a14631092b176cc692b6024aa62a639ef61
ConnectedSystems added a commit to ConnectedSystems/SALib that referenced this pull request Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question] Desire for qMC library in scipy?