Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add infrastructure for examples as IPython notebooks #5233

Closed
ev-br opened this issue Sep 8, 2015 · 27 comments
Closed

add infrastructure for examples as IPython notebooks #5233

ev-br opened this issue Sep 8, 2015 · 27 comments
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org needs-decision Items that need further discussion before they are merged or closed Website Items related to the website; please also check https://github.com/scipy/scipy.org

Comments

@ev-br
Copy link
Member

ev-br commented Sep 8, 2015

(this is a continuation of #5044 (comment))

At the moment we have two ways of illustrating scipy functionality. Small examples go to the docstrings, there is also the tutorial for more narrative style docs with usage examples.

In many cases, it would be nice to be able to provide longer worked examples which do not easily fit into either of these two places. It would also be useful to be able to add worked examples as IPython/Jupiter notebook instead of (or along with) reST files.

As an example of what I'm taking about, two notebooks which prompted this are linked at the top of gh-5044.

Here's an attempt at a concrete suggestion. Each submodule entry in the tutorial grows a link "Extended examples" (a tentative name). These links point to a separate repository under the scipy org, which only hosts notebooks. The access rights to this repository and the quality standards for the content are to the first approximation the same as for the main scipy repo.

This way:

  • we keep quality examples in a centralized place (discoverability and hopefully bitrot)
  • we can uphold standards of quality
  • we do not bundle them into the source tree, so we don't need to worry about the download sizes (eg, one can embed graphics/animations/whatnot)
  • we only store notebooks, and delegate rendering to github and/or nbviewer.
  • this all seems to be reasonably easy to do :-).

This whole thing seems to have an overlap with scipy central, but I've to admit I've no idea what is the status/scope of the latter. It'd be awesome if somebody in the know could weigh in.

There is also prior art of statsmodels, who seem to ship notebook examples. (@josef-pkt, @jseabold)

@ev-br ev-br added Website Items related to the website; please also check https://github.com/scipy/scipy.org Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org needs-decision Items that need further discussion before they are merged or closed labels Sep 8, 2015
@josef-pkt
Copy link
Member

About statsmodels:

In our case the notebooks are part of the documentation, similar to the scipy tutorials. We don't store the output in the repo, but run the notebooks in a custom build to include it in the rst sphinx docs.
Skipper wrote versions of this several years ago (and an ipy directive for sphinx docs before that). Our nbconvert rendering is currently a bit broken, it doesn't work correctly with latest notebook/ipython version and is not fully python 3 compatible.

The advantage is that the notebooks are included in our standard documentation and the doc build is supposed to catch bitrot/refactoring victims.
As alternative we thought of starting a new repo with just notebooks that include the output so it can be rendered by github or nbviewer. bitrot might be a bit larger, but it would be less work.

@pv
Copy link
Member

pv commented Sep 8, 2015

The examples from wiki.scipy.org converted to notebooks: https://github.com/pv/SciPy-CookBook
They're bitrotted to some degree though, although some have been rescued.

@ev-br
Copy link
Member Author

ev-br commented Sep 9, 2015

Re notebook bitrot, MinRK's script seems useful (and can be extended with our doctest checker from refguide_check.py if desired).
http://stackoverflow.com/questions/20483313/testing-ipython-notebooks

@rgommers
Copy link
Member

Examples tend to bitrot. There isn't much in the CookBook that's still of interest. So I have a preference for including notebooks inside the scipy repo. The statsmodels examples are in total now 0.5 Mb in size (for 35 .ipynb files). So it'll take quite a while before it would significantly affect scipy download sizes - our sdist output is now 18 Mb.

@ev-br
Copy link
Member Author

ev-br commented Oct 24, 2015

My slight preference is still for a separate repo:

  • bitrotting is an issue regardless of where things are
  • incorporating notebooks into the build sounds fragile. Josef's comment "nbconvert rendering is currently a bit broken" does not sound very encouraging :-).
    My own experience (nbconverting one single presentation) is mixed -- the result is great, but the amount of magic is large. And if something goes wrong (eg, a new version of ipython is out), troubleshooting requires JS skill and wading through rather sparse docs.
  • If we keep output in the notebooks, it does not take much to grow the repo size: just a couple of images, then a couple of animations (using JSAnimation, for instance), and there you are.
  • If we keep the notebooks under the scipy repo, what is the policy for dependencies? Eg, do we include an example which tries to import PyOperators or algopy or dask or ...?
  • It might be just me, but discoverability seems better with a separate repo. For one, I've no idea where on my hard drive are the examples which come with the python as packaged by the linux distro. This way, having examples on the web might serve better those people who get packaged scipy.

@josef-pkt
Copy link
Member

I have experienced all the point of @ev-br with our notebooks

Most likely statsmodels will use in future both notebooks as way to write official documentation as now, and a separate repo with additional notebooks.

One issue I ran into with a private notebook collection is compatibility with specific statsmodels versions. The advantage of the notebooks in the main repo is that they get versioned with and for every release. (At least theoretically, except for uncorrected bitrot.)

One reason for statsmodels to keep notebooks in the main repo docs is better integration and better availability in the online docs. The intention for this part of the docs is much more like the scipy tutorials that is basic documentation that just happens to use notebooks because then we can also run the code directly.

@rgommers
Copy link
Member

Okay, I'm going to change my mind then, let's do a separate repo. The important points made for me are that nbconverting is too fragile and that we can keep converted output in the repo if we don't have to worry about size.

I'm still concerned about bitrotting - inside the main repo we could simply test them on TravisCI against new PRs. Maybe this is possible in a separate repo as well though, with some github hooks that send a trigger.

Dependencies: I'd say that using any well-known package like PyOperators or AlgoPy or dask should be fine.

@rgommers
Copy link
Member

And maybe let's get it going first, and only worry about github hooks for testing scipy PRs later.

@rgommers
Copy link
Member

This whole thing seems to have an overlap with scipy central, but I've to admit I've no idea what is the status/scope of the latter. It'd be awesome if somebody in the know could weigh in.

@ksurya can comment on ScipyCentral (it is moving forward). I don't think the overlap is that large though.

ScipyCentral is meant as a place where users can store and show off things that use the whole ecosystem. So not curated and and a broad scope / set of packages. If next to that we have one repo that is curated and only focused on longer examples demonstrating functionality in scipy itself, that should be OK.

@rgommers
Copy link
Member

@ev-br maybe good to announce the plan on the mailing list, and finalize a few things (like rules for dependencies, repo name, etc.) there?

@pv
Copy link
Member

pv commented Oct 31, 2015

how about https://scipy-cookbook.readthedocs.org/

@rgommers
Copy link
Member

Ooo, fancy. That looks really good.

But still a separate repo for curated notebooks that we properly test I assume?

@ksurya
Copy link
Member

ksurya commented Dec 5, 2015

This loosely overlaps with SciPy Central, wherein users typically submit their snippets/examples as @rgommers mentioned.

I have for some time in the past thought about integrating notebooks in SciPy Central. It made sense at a high level because the content shared in SciPy Central and notebooks broadly come under the same category. However,

  • When it comes to managing docs, I feel GitHub does a better job.
  • When it comes to sharing docs, SciPy Central can do fancy job. We can have comments on examples, voting, some competitive puzzles/questions etc.
  • The examples cited above by Pauli seem to be nicely organized into sections, subsections. But the current SciPy Central functionality only allows to tag examples with a category name.

From what I noticed in the SciPy Central database, more people tend to store their work in places like GitHub and then submit a link. However, I feel this app can be more than what it is used for. It would be great if we actually take a look into what kind of things we want in SciPy and how this app can be shaped to serve them. As far as notebooks are concerned, I am thinking if we can have them on GitHub, and let the SciPy Central parse them nicely using the IPython Notebook Viewer Service that's already hosted at Rackspace.

I apologize for the late reply.

@rgommers
Copy link
Member

xref the discussion in gh-16699. This is the first time in a while that this topic came up, but there's 5 notebooks there now and an intent to write more of them. So we should revisit this discussion. A lot of infrastructure for notebooks has improved since the last comment on this issue.

The options seem to be:

  1. Do something like NumPy did in https://github.com/numpy/numpy-tutorials
  2. Allow using jupytext .md files directly in our html doc builds.

If (2) is feasible, it's perhaps better than a whole separate repo that needs its own maintenance/infra and gives more issues with cross-linking. A quick check of the numpy-tutorials repo says that the only dependency that we are missing for allowing jupytext files in this repo is myst-nb for doc build, and jupytext and nbval for the refguide-checker. We shouldn't need the theme related dependencies, because we already have a working theme (assuming that works with myst).

Of course myst-nb comes with its own dependencies, so it needs some investigation what those are and if we're happy with those for all doc build needs.

@rossbar
Copy link
Contributor

rossbar commented Aug 29, 2022

I just wanted to chime in to share some thoughts about approach number 1, which I've worked on both for numpy and networkx.

If (2) is feasible, it's perhaps better than a whole separate repo that needs its own maintenance/infra and gives more issues with cross-linking.

There were many motivations for a separate tutorials repo in the beginning, not least of which was the desire to try infrastructure based on executablebooks. One of the main motivations though for keeping the tutorials repo separate from the main docs was to guarantee that the documentation build times didn't balloon as more tutorials were added. This concern can be mitigated by caching in CI, but at least to start we thought it'd be easier to have a separate tutorials repo. Note also that the executablebooks tooling is based on sphinx, i.e. myst-nb is a sphinx extension, so it'd be straightforward to move tutorials into the main docs (or vice versa) if desired. Furthermore, because everything is sphinx-based, cross-linking isn't an issue thanks to intersphinx.

A quick check of the numpy-tutorials repo says that the only dependency that we are missing for allowing jupytext files in this repo is myst-nb for doc build, and jupytext and nbval for the refguide-checker.

This depends on what you want to do. If you want to integrate text-based jupyter notebooks directly in an existing sphinx doc build then all you really need is myst-nb. Note also that nbval is not strictly necessary for testing: myst-nb executes notebooks at build-time1 and raises sphinx warnings on execution failure, so if you already have a doc build scheme that elevates sphinx warnings to errors (like scipy does) then you get notebook execution testing "for free". FWIW this is the way numpy-tutorials was originally set up - nbval was added later on (see also numpy/numpy-tutorials#51 and numpy/numpy-tutorials#132). There are many different ways this can be done, but I just wanted to note quickly that the minimum dependency footprint for adding text-based notebooks to docs is very limited depending on how you want to do it.

We shouldn't need the theme related dependencies, because we already have a working theme (assuming that works with myst).

Again, the executablebooks (myst-nb) project is sphinx based, so sphinx themes work OOB 👍 .

Of course myst-nb comes with its own dependencies, so it needs some investigation what those are and if we're happy with those for all doc build needs.

IME the dependency issue has proven significant. Many of the tools in executeablebooks have used aggressive dependency pins in the past which has resulted in dependency resolution issues in numpy-tutorials that have been really difficult to track down and resolve. See e.g. executablebooks/MyST-NB#289 and executablebooks/MyST-NB#333. However, dependency issues have been cropping up less and less frequently as myst-nb has matured.

Footnotes

  1. By default at least, but this is of course configurable.

@rgommers
Copy link
Member

@rossbar thanks for the input! I guess my main follow-up question is: what do we actually gain from executablebooks over pydata-sphinx-theme? The one thing I see are the two icons for launching on Binder and for downloading the .ipynb. Other than that, isn't it mainly downsides (more dependency issues, dealing with two themes, extra repo overhead)?

One of the main motivations though for keeping the tutorials repo separate from the main docs was to guarantee that the documentation build times didn't balloon as more tutorials were added.

There were a couple of thoughts there:

  1. We wanted to allow using more dependencies that were not okay for the main repo. This is less of an issue for SciPy than for NumPy, because SciPy has much more built-in functionality to write interesting tutorials with.
  2. We didn't want size constraints for data. I think this is solved now that we have scipy.datasets.
  3. Docs build time. I'm not sure this is much of an issue when we are writing tutorials for SciPy functionality alone, rather than more "use case" type notebooks. We are already executing a ton of doc snippets to generate figures in the docs. It's fast compared to parsing and writing out html.

@rossbar
Copy link
Contributor

rossbar commented Aug 30, 2022

I guess my main follow-up question is: what do we actually gain from executablebooks over pydata-sphinx-theme

Sorry, I think I've caused confusion with the terminology :). executeablebooks is the project that supports all of the myst tooling (myst-nb, myst-markdown parser, etc.) including the sphinx-book-theme to which you're referring, but there's no need to change/update themes - the pydata-sphinx-theme is completely fine and can be used with myst-nb (as can any other sphinx theme). I only mentioned executablebooks as a whole to make the distinction between the sphinx-based approach for executing notebooks (e.g. with myst-nb) vs. other "home-grown" approaches such as executing/converting notebooks with jupytext and incorporating the results in the docs manually.

There were a couple of thoughts there:

Given these bullet points, it sounds like the approach that might make the most sense for scipy is to incorporate notebook-based content directly w/ myst-nb 👍

@mdhaber
Copy link
Contributor

mdhaber commented Aug 31, 2022

I want to make sure I'm understanding correctly before I ask questions that don't make sense.

Were tutorials like "Linear algebra on n-dimensional arrays" written as notebooks, converted to markdown to be committed to the numpy-tutorials repo, and then from the markdown rendered as HTML to be viewed online?

If that's correct, how much tweaking of the markdown tends to be required after converting from the original notebook to the format that is committed? (If none at all, great. That's the dream. I've tried to use nbconvert a few times before, so it's great to hear if this is more seamless.)

@rgommers
Copy link
Member

Indeed. And no manual tweaking of the .md file is needed.

@melissawm
Copy link
Contributor

Hi folks - I'd be happy to create a draft PR with the basic set up for this, unless @rossbar or someone else wants to do it :) One thing to be aware of is that the executable notebooks do increase the docs build time, so it may be something to consider when deciding on adding these executable tutorials on the main repo vs. a separate tutorials repo. Another point is that with the myst-nb and jupytext installed, you don't really need to go through .ipynb files at all, as jupyterlab can open .md files as notebooks with all functionality preserved. This can be an optional workflow, but is available.

@andyfaff
Copy link
Contributor

andyfaff commented Sep 8, 2022

I wonder if it's worth running black on notebooks that we add to make sure the style is nice from the start. We can ensure that it only runs on jupyter notebooks.

@rgommers
Copy link
Member

rgommers commented Sep 9, 2022

Another point is that with the myst-nb and jupytext installed, you don't really need to go through .ipynb files at all, as jupyterlab can open .md files as notebooks with all functionality preserved. This can be an optional workflow, but is available.

That sounds nice to me. Markdown is way nicer to keep in a repo compared to .ipynb's.

If black can run on Python content in .md files, then that sounds fine to me to add.

@andyfaff
Copy link
Contributor

andyfaff commented Sep 9, 2022

Black will work on jupyter, but not MD

@rossbar
Copy link
Contributor

rossbar commented Sep 10, 2022

I'd be happy to create a draft PR with the basic set up for this, unless @rossbar or someone else wants to do it

It'd probably be easier for someone more familiar with the plan/history of notebooks in the scipy docs to do so, but please feel free to ping me at review time!

Black will work on jupyter, but not MD

Yeah unfortunately nbqa doesn't (yet) support the text-based markdown notebook format out of the box, but fortunately jupytext makes it easy to convert between formats. The general approach would be something like:

  1. Run nbqa on the .ipynb files, e.g. nbqa black *.ipynb
  2. convert to markdown, e.g. jupytext --to myst *.ipynb

You can then commit the resulting .md files, which will have black-formatted code cells.

@rgommers
Copy link
Member

I was going to suggest opening a feature request for nbqa, but it's already asked at nbQA-dev/nbQA#668. With a resolution according to @rossbar's suggestion.

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented Sep 13, 2022

Yeah unfortunately nbqa doesn't (yet) support the text-based markdown notebook format out of the box,

This is coming soon! If you'd like to try it out to help unearth bugs before a release comes out later this week:

$ pip install git+https://github.com/MarcoGorelli/nbQA.git@jupytext jupytext black
$ nbqa black . --nbqa-files '\.md$'

UPDATE this is now available, as of version 1.5.0 1.5.1

@ev-br
Copy link
Member Author

ev-br commented May 5, 2023

Looks like this issue has run its course thanks to gh-17322. Or is there anything to keep this issue open @melissawm ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org needs-decision Items that need further discussion before they are merged or closed Website Items related to the website; please also check https://github.com/scipy/scipy.org
Projects
None yet
Development

No branches or pull requests

10 participants