add infrastructure for examples as IPython notebooks #5233

ev-br · 2015-09-08T16:15:44Z

(this is a continuation of #5044 (comment))

At the moment we have two ways of illustrating scipy functionality. Small examples go to the docstrings, there is also the tutorial for more narrative style docs with usage examples.

In many cases, it would be nice to be able to provide longer worked examples which do not easily fit into either of these two places. It would also be useful to be able to add worked examples as IPython/Jupiter notebook instead of (or along with) reST files.

As an example of what I'm taking about, two notebooks which prompted this are linked at the top of gh-5044.

Here's an attempt at a concrete suggestion. Each submodule entry in the tutorial grows a link "Extended examples" (a tentative name). These links point to a separate repository under the scipy org, which only hosts notebooks. The access rights to this repository and the quality standards for the content are to the first approximation the same as for the main scipy repo.

This way:

we keep quality examples in a centralized place (discoverability and hopefully bitrot)
we can uphold standards of quality
we do not bundle them into the source tree, so we don't need to worry about the download sizes (eg, one can embed graphics/animations/whatnot)
we only store notebooks, and delegate rendering to github and/or nbviewer.
this all seems to be reasonably easy to do :-).

This whole thing seems to have an overlap with scipy central, but I've to admit I've no idea what is the status/scope of the latter. It'd be awesome if somebody in the know could weigh in.

There is also prior art of statsmodels, who seem to ship notebook examples. (@josef-pkt, @jseabold)

josef-pkt · 2015-09-08T16:28:24Z

About statsmodels:

In our case the notebooks are part of the documentation, similar to the scipy tutorials. We don't store the output in the repo, but run the notebooks in a custom build to include it in the rst sphinx docs.
Skipper wrote versions of this several years ago (and an ipy directive for sphinx docs before that). Our nbconvert rendering is currently a bit broken, it doesn't work correctly with latest notebook/ipython version and is not fully python 3 compatible.

The advantage is that the notebooks are included in our standard documentation and the doc build is supposed to catch bitrot/refactoring victims.
As alternative we thought of starting a new repo with just notebooks that include the output so it can be rendered by github or nbviewer. bitrot might be a bit larger, but it would be less work.

pv · 2015-09-08T16:36:16Z

The examples from wiki.scipy.org converted to notebooks: https://github.com/pv/SciPy-CookBook
They're bitrotted to some degree though, although some have been rescued.

ev-br · 2015-09-09T11:12:19Z

Re notebook bitrot, MinRK's script seems useful (and can be extended with our doctest checker from refguide_check.py if desired).
http://stackoverflow.com/questions/20483313/testing-ipython-notebooks

rgommers · 2015-10-24T12:36:03Z

Examples tend to bitrot. There isn't much in the CookBook that's still of interest. So I have a preference for including notebooks inside the scipy repo. The statsmodels examples are in total now 0.5 Mb in size (for 35 .ipynb files). So it'll take quite a while before it would significantly affect scipy download sizes - our sdist output is now 18 Mb.

ev-br · 2015-10-24T13:02:16Z

My slight preference is still for a separate repo:

bitrotting is an issue regardless of where things are
incorporating notebooks into the build sounds fragile. Josef's comment "nbconvert rendering is currently a bit broken" does not sound very encouraging :-).
My own experience (nbconverting one single presentation) is mixed -- the result is great, but the amount of magic is large. And if something goes wrong (eg, a new version of ipython is out), troubleshooting requires JS skill and wading through rather sparse docs.
If we keep output in the notebooks, it does not take much to grow the repo size: just a couple of images, then a couple of animations (using JSAnimation, for instance), and there you are.
If we keep the notebooks under the scipy repo, what is the policy for dependencies? Eg, do we include an example which tries to import PyOperators or algopy or dask or ...?
It might be just me, but discoverability seems better with a separate repo. For one, I've no idea where on my hard drive are the examples which come with the python as packaged by the linux distro. This way, having examples on the web might serve better those people who get packaged scipy.

josef-pkt · 2015-10-24T14:27:32Z

I have experienced all the point of @ev-br with our notebooks

Most likely statsmodels will use in future both notebooks as way to write official documentation as now, and a separate repo with additional notebooks.

One issue I ran into with a private notebook collection is compatibility with specific statsmodels versions. The advantage of the notebooks in the main repo is that they get versioned with and for every release. (At least theoretically, except for uncorrected bitrot.)

One reason for statsmodels to keep notebooks in the main repo docs is better integration and better availability in the online docs. The intention for this part of the docs is much more like the scipy tutorials that is basic documentation that just happens to use notebooks because then we can also run the code directly.

rgommers · 2015-10-31T11:02:44Z

Okay, I'm going to change my mind then, let's do a separate repo. The important points made for me are that nbconverting is too fragile and that we can keep converted output in the repo if we don't have to worry about size.

I'm still concerned about bitrotting - inside the main repo we could simply test them on TravisCI against new PRs. Maybe this is possible in a separate repo as well though, with some github hooks that send a trigger.

Dependencies: I'd say that using any well-known package like PyOperators or AlgoPy or dask should be fine.

rgommers · 2015-10-31T11:03:21Z

And maybe let's get it going first, and only worry about github hooks for testing scipy PRs later.

rgommers · 2015-10-31T11:08:19Z

This whole thing seems to have an overlap with scipy central, but I've to admit I've no idea what is the status/scope of the latter. It'd be awesome if somebody in the know could weigh in.

@ksurya can comment on ScipyCentral (it is moving forward). I don't think the overlap is that large though.

ScipyCentral is meant as a place where users can store and show off things that use the whole ecosystem. So not curated and and a broad scope / set of packages. If next to that we have one repo that is curated and only focused on longer examples demonstrating functionality in scipy itself, that should be OK.

rgommers · 2015-10-31T11:10:49Z

@ev-br maybe good to announce the plan on the mailing list, and finalize a few things (like rules for dependencies, repo name, etc.) there?

pv · 2015-10-31T16:31:48Z

how about https://scipy-cookbook.readthedocs.org/

rgommers · 2015-10-31T22:52:27Z

Ooo, fancy. That looks really good.

But still a separate repo for curated notebooks that we properly test I assume?

ksurya · 2015-12-05T13:25:43Z

This loosely overlaps with SciPy Central, wherein users typically submit their snippets/examples as @rgommers mentioned.

I have for some time in the past thought about integrating notebooks in SciPy Central. It made sense at a high level because the content shared in SciPy Central and notebooks broadly come under the same category. However,

When it comes to managing docs, I feel GitHub does a better job.
When it comes to sharing docs, SciPy Central can do fancy job. We can have comments on examples, voting, some competitive puzzles/questions etc.
The examples cited above by Pauli seem to be nicely organized into sections, subsections. But the current SciPy Central functionality only allows to tag examples with a category name.

From what I noticed in the SciPy Central database, more people tend to store their work in places like GitHub and then submit a link. However, I feel this app can be more than what it is used for. It would be great if we actually take a look into what kind of things we want in SciPy and how this app can be shaped to serve them. As far as notebooks are concerned, I am thinking if we can have them on GitHub, and let the SciPy Central parse them nicely using the IPython Notebook Viewer Service that's already hosted at Rackspace.

I apologize for the late reply.

rgommers · 2022-08-29T08:40:05Z

xref the discussion in gh-16699. This is the first time in a while that this topic came up, but there's 5 notebooks there now and an intent to write more of them. So we should revisit this discussion. A lot of infrastructure for notebooks has improved since the last comment on this issue.

The options seem to be:

Do something like NumPy did in https://github.com/numpy/numpy-tutorials
Allow using jupytext .md files directly in our html doc builds.

If (2) is feasible, it's perhaps better than a whole separate repo that needs its own maintenance/infra and gives more issues with cross-linking. A quick check of the numpy-tutorials repo says that the only dependency that we are missing for allowing jupytext files in this repo is myst-nb for doc build, and jupytext and nbval for the refguide-checker. We shouldn't need the theme related dependencies, because we already have a working theme (assuming that works with myst).

Of course myst-nb comes with its own dependencies, so it needs some investigation what those are and if we're happy with those for all doc build needs.

rossbar · 2022-08-29T12:58:41Z

I just wanted to chime in to share some thoughts about approach number 1, which I've worked on both for numpy and networkx.

If (2) is feasible, it's perhaps better than a whole separate repo that needs its own maintenance/infra and gives more issues with cross-linking.

There were many motivations for a separate tutorials repo in the beginning, not least of which was the desire to try infrastructure based on executablebooks. One of the main motivations though for keeping the tutorials repo separate from the main docs was to guarantee that the documentation build times didn't balloon as more tutorials were added. This concern can be mitigated by caching in CI, but at least to start we thought it'd be easier to have a separate tutorials repo. Note also that the executablebooks tooling is based on sphinx, i.e. myst-nb is a sphinx extension, so it'd be straightforward to move tutorials into the main docs (or vice versa) if desired. Furthermore, because everything is sphinx-based, cross-linking isn't an issue thanks to intersphinx.

A quick check of the numpy-tutorials repo says that the only dependency that we are missing for allowing jupytext files in this repo is myst-nb for doc build, and jupytext and nbval for the refguide-checker.

This depends on what you want to do. If you want to integrate text-based jupyter notebooks directly in an existing sphinx doc build then all you really need is myst-nb. Note also that nbval is not strictly necessary for testing: myst-nb executes notebooks at build-time¹ and raises sphinx warnings on execution failure, so if you already have a doc build scheme that elevates sphinx warnings to errors (like scipy does) then you get notebook execution testing "for free". FWIW this is the way numpy-tutorials was originally set up - nbval was added later on (see also numpy/numpy-tutorials#51 and numpy/numpy-tutorials#132). There are many different ways this can be done, but I just wanted to note quickly that the minimum dependency footprint for adding text-based notebooks to docs is very limited depending on how you want to do it.

We shouldn't need the theme related dependencies, because we already have a working theme (assuming that works with myst).

Again, the executablebooks (myst-nb) project is sphinx based, so sphinx themes work OOB 👍 .

Of course myst-nb comes with its own dependencies, so it needs some investigation what those are and if we're happy with those for all doc build needs.

IME the dependency issue has proven significant. Many of the tools in executeablebooks have used aggressive dependency pins in the past which has resulted in dependency resolution issues in numpy-tutorials that have been really difficult to track down and resolve. See e.g. executablebooks/MyST-NB#289 and executablebooks/MyST-NB#333. However, dependency issues have been cropping up less and less frequently as myst-nb has matured.

By default at least, but this is of course configurable. ↩

rgommers · 2022-08-30T04:33:48Z

@rossbar thanks for the input! I guess my main follow-up question is: what do we actually gain from executablebooks over pydata-sphinx-theme? The one thing I see are the two icons for launching on Binder and for downloading the .ipynb. Other than that, isn't it mainly downsides (more dependency issues, dealing with two themes, extra repo overhead)?

One of the main motivations though for keeping the tutorials repo separate from the main docs was to guarantee that the documentation build times didn't balloon as more tutorials were added.

There were a couple of thoughts there:

We wanted to allow using more dependencies that were not okay for the main repo. This is less of an issue for SciPy than for NumPy, because SciPy has much more built-in functionality to write interesting tutorials with.
We didn't want size constraints for data. I think this is solved now that we have scipy.datasets.
Docs build time. I'm not sure this is much of an issue when we are writing tutorials for SciPy functionality alone, rather than more "use case" type notebooks. We are already executing a ton of doc snippets to generate figures in the docs. It's fast compared to parsing and writing out html.

rossbar · 2022-08-30T13:41:25Z

I guess my main follow-up question is: what do we actually gain from executablebooks over pydata-sphinx-theme

Sorry, I think I've caused confusion with the terminology :). executeablebooks is the project that supports all of the myst tooling (myst-nb, myst-markdown parser, etc.) including the sphinx-book-theme to which you're referring, but there's no need to change/update themes - the pydata-sphinx-theme is completely fine and can be used with myst-nb (as can any other sphinx theme). I only mentioned executablebooks as a whole to make the distinction between the sphinx-based approach for executing notebooks (e.g. with myst-nb) vs. other "home-grown" approaches such as executing/converting notebooks with jupytext and incorporating the results in the docs manually.

There were a couple of thoughts there:

Given these bullet points, it sounds like the approach that might make the most sense for scipy is to incorporate notebook-based content directly w/ myst-nb 👍

mdhaber · 2022-08-31T00:08:15Z

I want to make sure I'm understanding correctly before I ask questions that don't make sense.

Were tutorials like "Linear algebra on n-dimensional arrays" written as notebooks, converted to markdown to be committed to the numpy-tutorials repo, and then from the markdown rendered as HTML to be viewed online?

If that's correct, how much tweaking of the markdown tends to be required after converting from the original notebook to the format that is committed? (If none at all, great. That's the dream. I've tried to use nbconvert a few times before, so it's great to hear if this is more seamless.)

rgommers · 2022-08-31T09:43:39Z

Indeed. And no manual tweaking of the .md file is needed.

melissawm · 2022-09-08T20:25:50Z

Hi folks - I'd be happy to create a draft PR with the basic set up for this, unless @rossbar or someone else wants to do it :) One thing to be aware of is that the executable notebooks do increase the docs build time, so it may be something to consider when deciding on adding these executable tutorials on the main repo vs. a separate tutorials repo. Another point is that with the myst-nb and jupytext installed, you don't really need to go through .ipynb files at all, as jupyterlab can open .md files as notebooks with all functionality preserved. This can be an optional workflow, but is available.

andyfaff · 2022-09-08T21:03:06Z

I wonder if it's worth running black on notebooks that we add to make sure the style is nice from the start. We can ensure that it only runs on jupyter notebooks.

rgommers · 2022-09-09T07:05:29Z

Another point is that with the myst-nb and jupytext installed, you don't really need to go through .ipynb files at all, as jupyterlab can open .md files as notebooks with all functionality preserved. This can be an optional workflow, but is available.

That sounds nice to me. Markdown is way nicer to keep in a repo compared to .ipynb's.

If black can run on Python content in .md files, then that sounds fine to me to add.

andyfaff · 2022-09-09T07:51:37Z

Black will work on jupyter, but not MD

rossbar · 2022-09-10T01:40:43Z

I'd be happy to create a draft PR with the basic set up for this, unless @rossbar or someone else wants to do it

It'd probably be easier for someone more familiar with the plan/history of notebooks in the scipy docs to do so, but please feel free to ping me at review time!

Black will work on jupyter, but not MD

Yeah unfortunately nbqa doesn't (yet) support the text-based markdown notebook format out of the box, but fortunately jupytext makes it easy to convert between formats. The general approach would be something like:

Run nbqa on the .ipynb files, e.g. nbqa black *.ipynb
convert to markdown, e.g. jupytext --to myst *.ipynb

You can then commit the resulting .md files, which will have black-formatted code cells.

rgommers · 2022-09-11T13:30:48Z

I was going to suggest opening a feature request for nbqa, but it's already asked at nbQA-dev/nbQA#668. With a resolution according to @rossbar's suggestion.

MarcoGorelli · 2022-09-13T21:05:35Z

Yeah unfortunately nbqa doesn't (yet) support the text-based markdown notebook format out of the box,

This is coming soon! If you'd like to try it out to help unearth bugs before a release comes out later this week:

$ pip install git+https://github.com/MarcoGorelli/nbQA.git@jupytext jupytext black
$ nbqa black . --nbqa-files '\.md$'

UPDATE this is now available, as of version ~~1.5.0~~ 1.5.1

ev-br · 2023-05-05T07:35:11Z

Looks like this issue has run its course thanks to gh-17322. Or is there anything to keep this issue open @melissawm ?

ev-br mentioned this issue Sep 8, 2015

[GSoC] ENH: New least-squares algorithms #5044

Merged

StanczakDominik mentioned this issue Jun 29, 2019

Allow .ipynb submissions of examples somehow? PlasmaPy/PlasmaPy#638

Closed

ev-br mentioned this issue Jul 31, 2022

DOC: stats: resampling and Monte Carlo methods tutorial #16699

Merged

melissawm mentioned this issue Nov 2, 2022

DOC: Add notebook infrastructure for the docs #17322

Merged

melissawm closed this as completed Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add infrastructure for examples as IPython notebooks #5233

add infrastructure for examples as IPython notebooks #5233

ev-br commented Sep 8, 2015

josef-pkt commented Sep 8, 2015

pv commented Sep 8, 2015

ev-br commented Sep 9, 2015

rgommers commented Oct 24, 2015

ev-br commented Oct 24, 2015

josef-pkt commented Oct 24, 2015

rgommers commented Oct 31, 2015

rgommers commented Oct 31, 2015

rgommers commented Oct 31, 2015

rgommers commented Oct 31, 2015

pv commented Oct 31, 2015

rgommers commented Oct 31, 2015

ksurya commented Dec 5, 2015

rgommers commented Aug 29, 2022

rossbar commented Aug 29, 2022

rgommers commented Aug 30, 2022

rossbar commented Aug 30, 2022

mdhaber commented Aug 31, 2022

rgommers commented Aug 31, 2022

melissawm commented Sep 8, 2022

andyfaff commented Sep 8, 2022

rgommers commented Sep 9, 2022

andyfaff commented Sep 9, 2022

rossbar commented Sep 10, 2022

rgommers commented Sep 11, 2022

MarcoGorelli commented Sep 13, 2022 •

edited

ev-br commented May 5, 2023

add infrastructure for examples as IPython notebooks #5233

add infrastructure for examples as IPython notebooks #5233

Comments

ev-br commented Sep 8, 2015

josef-pkt commented Sep 8, 2015

pv commented Sep 8, 2015

ev-br commented Sep 9, 2015

rgommers commented Oct 24, 2015

ev-br commented Oct 24, 2015

josef-pkt commented Oct 24, 2015

rgommers commented Oct 31, 2015

rgommers commented Oct 31, 2015

rgommers commented Oct 31, 2015

rgommers commented Oct 31, 2015

pv commented Oct 31, 2015

rgommers commented Oct 31, 2015

ksurya commented Dec 5, 2015

rgommers commented Aug 29, 2022

rossbar commented Aug 29, 2022

Footnotes

rgommers commented Aug 30, 2022

rossbar commented Aug 30, 2022

mdhaber commented Aug 31, 2022

rgommers commented Aug 31, 2022

melissawm commented Sep 8, 2022

andyfaff commented Sep 8, 2022

rgommers commented Sep 9, 2022

andyfaff commented Sep 9, 2022

rossbar commented Sep 10, 2022

rgommers commented Sep 11, 2022

MarcoGorelli commented Sep 13, 2022 • edited

ev-br commented May 5, 2023

MarcoGorelli commented Sep 13, 2022 •

edited