New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add infrastructure for examples as IPython notebooks #5233
Comments
About statsmodels: In our case the notebooks are part of the documentation, similar to the scipy tutorials. We don't store the output in the repo, but run the notebooks in a custom build to include it in the rst sphinx docs. The advantage is that the notebooks are included in our standard documentation and the doc build is supposed to catch bitrot/refactoring victims. |
The examples from wiki.scipy.org converted to notebooks: https://github.com/pv/SciPy-CookBook |
Re notebook bitrot, MinRK's script seems useful (and can be extended with our doctest checker from refguide_check.py if desired). |
Examples tend to bitrot. There isn't much in the CookBook that's still of interest. So I have a preference for including notebooks inside the scipy repo. The statsmodels examples are in total now 0.5 Mb in size (for 35 .ipynb files). So it'll take quite a while before it would significantly affect scipy download sizes - our |
My slight preference is still for a separate repo:
|
I have experienced all the point of @ev-br with our notebooks Most likely statsmodels will use in future both notebooks as way to write official documentation as now, and a separate repo with additional notebooks. One issue I ran into with a private notebook collection is compatibility with specific statsmodels versions. The advantage of the notebooks in the main repo is that they get versioned with and for every release. (At least theoretically, except for uncorrected bitrot.) One reason for statsmodels to keep notebooks in the main repo docs is better integration and better availability in the online docs. The intention for this part of the docs is much more like the scipy tutorials that is basic documentation that just happens to use notebooks because then we can also run the code directly. |
Okay, I'm going to change my mind then, let's do a separate repo. The important points made for me are that nbconverting is too fragile and that we can keep converted output in the repo if we don't have to worry about size. I'm still concerned about bitrotting - inside the main repo we could simply test them on TravisCI against new PRs. Maybe this is possible in a separate repo as well though, with some github hooks that send a trigger. Dependencies: I'd say that using any well-known package like PyOperators or AlgoPy or dask should be fine. |
And maybe let's get it going first, and only worry about github hooks for testing scipy PRs later. |
@ksurya can comment on ScipyCentral (it is moving forward). I don't think the overlap is that large though. ScipyCentral is meant as a place where users can store and show off things that use the whole ecosystem. So not curated and and a broad scope / set of packages. If next to that we have one repo that is curated and only focused on longer examples demonstrating functionality in scipy itself, that should be OK. |
@ev-br maybe good to announce the plan on the mailing list, and finalize a few things (like rules for dependencies, repo name, etc.) there? |
how about https://scipy-cookbook.readthedocs.org/ |
Ooo, fancy. That looks really good. But still a separate repo for curated notebooks that we properly test I assume? |
This loosely overlaps with SciPy Central, wherein users typically submit their snippets/examples as @rgommers mentioned. I have for some time in the past thought about integrating notebooks in SciPy Central. It made sense at a high level because the content shared in SciPy Central and notebooks broadly come under the same category. However,
From what I noticed in the SciPy Central database, more people tend to store their work in places like GitHub and then submit a link. However, I feel this app can be more than what it is used for. It would be great if we actually take a look into what kind of things we want in SciPy and how this app can be shaped to serve them. As far as notebooks are concerned, I am thinking if we can have them on GitHub, and let the SciPy Central parse them nicely using the IPython Notebook Viewer Service that's already hosted at Rackspace. I apologize for the late reply. |
xref the discussion in gh-16699. This is the first time in a while that this topic came up, but there's 5 notebooks there now and an intent to write more of them. So we should revisit this discussion. A lot of infrastructure for notebooks has improved since the last comment on this issue. The options seem to be:
If (2) is feasible, it's perhaps better than a whole separate repo that needs its own maintenance/infra and gives more issues with cross-linking. A quick check of the Of course |
I just wanted to chime in to share some thoughts about approach number 1, which I've worked on both for numpy and networkx.
There were many motivations for a separate tutorials repo in the beginning, not least of which was the desire to try infrastructure based on executablebooks. One of the main motivations though for keeping the tutorials repo separate from the main docs was to guarantee that the documentation build times didn't balloon as more tutorials were added. This concern can be mitigated by caching in CI, but at least to start we thought it'd be easier to have a separate tutorials repo. Note also that the executablebooks tooling is based on sphinx, i.e.
This depends on what you want to do. If you want to integrate text-based jupyter notebooks directly in an existing sphinx doc build then all you really need is
Again, the executablebooks (myst-nb) project is sphinx based, so sphinx themes work OOB 👍 .
IME the dependency issue has proven significant. Many of the tools in executeablebooks have used aggressive dependency pins in the past which has resulted in dependency resolution issues in Footnotes
|
@rossbar thanks for the input! I guess my main follow-up question is: what do we actually gain from
There were a couple of thoughts there:
|
Sorry, I think I've caused confusion with the terminology :). executeablebooks is the project that supports all of the myst tooling (myst-nb, myst-markdown parser, etc.) including the
Given these bullet points, it sounds like the approach that might make the most sense for scipy is to incorporate notebook-based content directly w/ |
I want to make sure I'm understanding correctly before I ask questions that don't make sense. Were tutorials like "Linear algebra on n-dimensional arrays" written as notebooks, converted to markdown to be committed to the If that's correct, how much tweaking of the markdown tends to be required after converting from the original notebook to the format that is committed? (If none at all, great. That's the dream. I've tried to use |
Indeed. And no manual tweaking of the |
Hi folks - I'd be happy to create a draft PR with the basic set up for this, unless @rossbar or someone else wants to do it :) One thing to be aware of is that the executable notebooks do increase the docs build time, so it may be something to consider when deciding on adding these executable tutorials on the main repo vs. a separate tutorials repo. Another point is that with the myst-nb and jupytext installed, you don't really need to go through .ipynb files at all, as jupyterlab can open .md files as notebooks with all functionality preserved. This can be an optional workflow, but is available. |
I wonder if it's worth running |
That sounds nice to me. Markdown is way nicer to keep in a repo compared to If |
Black will work on jupyter, but not MD |
It'd probably be easier for someone more familiar with the plan/history of notebooks in the scipy docs to do so, but please feel free to ping me at review time!
Yeah unfortunately
You can then commit the resulting .md files, which will have black-formatted code cells. |
I was going to suggest opening a feature request for |
This is coming soon! If you'd like to try it out to help unearth bugs before a release comes out later this week:
UPDATE this is now available, as of version |
Looks like this issue has run its course thanks to gh-17322. Or is there anything to keep this issue open @melissawm ? |
(this is a continuation of #5044 (comment))
At the moment we have two ways of illustrating scipy functionality. Small examples go to the docstrings, there is also the tutorial for more narrative style docs with usage examples.
In many cases, it would be nice to be able to provide longer worked examples which do not easily fit into either of these two places. It would also be useful to be able to add worked examples as IPython/Jupiter notebook instead of (or along with) reST files.
As an example of what I'm taking about, two notebooks which prompted this are linked at the top of gh-5044.
Here's an attempt at a concrete suggestion. Each submodule entry in the tutorial grows a link "Extended examples" (a tentative name). These links point to a separate repository under the scipy org, which only hosts notebooks. The access rights to this repository and the quality standards for the content are to the first approximation the same as for the main scipy repo.
This way:
This whole thing seems to have an overlap with scipy central, but I've to admit I've no idea what is the status/scope of the latter. It'd be awesome if somebody in the know could weigh in.
There is also prior art of statsmodels, who seem to ship notebook examples. (@josef-pkt, @jseabold)
The text was updated successfully, but these errors were encountered: