Separate code examples from docs #1475

StevenMaude · 2023-08-03T15:03:07Z

Reasons for this PR

The goal of this PR is to make the code examples easier to test (#1325).

The proposal here is to move the existing code examples in the MkDocs documentation from being inline out to separate files.

While there are tools for checking inlined code source in documentation, these are probably not flexible enough for what we are working with.¹

First, a lot of our examples are dataset definitions, rather than pure source code. Beyond just checking "does it work?", we may want to validate the output of dataset definitions to ensure they are unchanged. Or at least know be warned that the outputs have changed, and rebuild the outputs accordingly.

Second, we actually have more than just dataset definitions in our ehrQL documentation: a mixture of reproducible code-like includes. These are:

dataset definitions, some that are intentionally broken to demonstrate failures
full OpenSAFELY projects
Python console sessions that use ehrQL
ehrQL error messages as Python tracebacks
commands run at the command-line

Each of these categories will need handling in different ways to verify they behave as expected.

Third, separating out the code means it is easier, should we wish, to run other source code tooling on examples such as Black or Ruff. Otherwise, workarounds or further dependencies are needed to handle the inlined code. For now, no additional tooling has been run, and the examples are essentially a direct copy-paste.²

What has been separated out

This separates out the following code examples that were inlined in documentation source:

project.yaml
almost all dataset definition code; there may be one or two one-line snippets left in
Python console sessions
tracebacks

This does not separate out:

command-line commands (we could consider this in future)

Overall, almost every code-like piece of content has been moved. There are a few other odds and ends that have not yet been moved out.

Example code in ehrQL source docstrings has also been left untouched.

Summary of the changes

Many files have been added and changed, which makes the overall diff very large. However, the overall changes are relatively straightforward to summarise. The changes are of three kinds:

removal of inlined code from Markdown source files, and replacement with snippet notation to include the newly created example file
creation of new files that store the previously inlined code
updates to configuration files to ignore the newly created files
- .gitignore modifies a now-unused path to point to paths of the new code examples. This is to ignore metadata files generated when running dataset definitions.
- pyproject.toml and .pre-commit-config.yaml are modified to exclude these separated examples from the existing Python checks. We are not running the usual Black and Ruff tooling on these files, so they fail our automatic checks. We may in future wish to remove these exclusions.

How the code examples are structured

The changes to DEVELOPERS.md documents how the examples are currently structured.
This is subject to change, if we agree in this PR's review that it should change.

Questions for review

Does separating out each individual example this way seem reasonable…?
…and is the directory structure too nested?
Is using the directory name reasonable for now, for specifying properties of the example, as opposed to adding some kind of metadata file to each. We can always change this in future.
- At the moment, all I can foresee we need for testing an example is knowing whether it should succeed or fail, and what kind of example it is (which could also be deduced from the structure of the example).
Does success or failure make sense for interactive Python sessions? The appearance of a traceback in these does not prevent continuation. success possibly does make sense if you consider it as "we expect that no exception should appear in this session".
Should tracebacks for failing examples go in output/ or log/?

Examples: markdown-exec and mktestdocs both work based on matching to the syntax label, for example, python. ↩
One example at least was fixed because pre-commit spotted that the Python syntax was invalid due to a missing parenthesis. ↩

cloudflare-pages · 2023-08-03T15:03:26Z

Deploying with Cloudflare Pages

Latest commit:	`f13ef60`
Status:	✅ Deploy successful!
Preview URL:	https://5435d273.databuilder.pages.dev
Branch Preview URL:	https://steve-separate-code-examples.databuilder.pages.dev

View logs

StevenMaude · 2023-09-06T12:21:03Z

🗒️ TODOs for future work. This was intended to be done now, but these tasks are more for when we start testing the examples. As the changes here are substantial, I would prefer to merge these changes sooner rather than later to avoid repeated rebases where editing intervention is needed.

Feel free to ignore this in review. These tasks would be handled in future.

Duplication/reuse

We can always switch to reusing specific code examples in future, if we feel it's an improvement.

duplicating dataset definitions for separate and project use? in one or two cases, this happens
reusing dataset definitions, for example, on landing page, and in tutorial?
sessions that are near duplicates of dataset definitions (example: in writing-a-dataset-definition)
running ehrQL example overlaps with the frames and series example considerably, though not absolutely identical

Generating outputs

how best to handle example data? always use the ehrQL bundled data unless a specific example is provided in the relevant page? is it better for examples to be typically coupled to the example data, unless there are good reasons not to (to demonstrate a particular issue which the example data does not) (gut feeling: use the ehrQL example data, in which case, we need to remove the example data added in some cases)
are .arrow formats reproducible for a given dataset? that is, can we verify that outputs haven't changed by checking a file hash, or do we need to load the file
still need to build all outputs and tracebacks (if not now, then later for testing)
we may want to include the full output log, not just the traceback, so we can include the more of the OpenSAFELY CLI logs (see the running-ehrql page example which combines a separated traceback file, with the start of the OpenSAFELY CLI logs)

Further quality control

for now, tracebacks are copied from the original docs page, but we need to generate these in future, and they may need snippet sections labelling so we don't include lots of extra information
checking that command invocations work, currently not doing
do we need to validate outputs where we are not using those outputs in the documentation? (gut feeling: yes, for consistency in the checks, and if we ever do want to include them)
ensure standalone examples are complete and working
do we want examples to be subject to Ruff and Black? Or do we enforce exclusions? (maybe Ruff, but not Black? Black is better for consistency, but introduces things like extra parentheses, unless we set a longer line length for the examples, if that's possible?)

Miscellaneous

should the examples be shown in full as a complete working example, or is it sufficient, as now, to show the section of relevance only (gut feeling: only the section of relevance; for now, this is unimportant as the examples have not been made complete)

inglesp · 2023-09-19T11:42:07Z

An idea:

Rather than keeping the code snippets separately to the docs, causing the friction that Iain describes, can we find a way to keep the snippets inline in the docs, and then extract them and test them as part of the build process?

This might be a bit fiddly but I think it would address Iain's concerns. It's also something that we could add incrementally.

In terms of implementation, I wonder whether we could support adding some optional metadata to each code snippet, to indicate how the snippet should be tested.

We could then write a mkdocs plugin that implements the on_page_content hook. This would pull out the metadata from a snippet, and (a) test the snippet and (b) remove the metadata from the content.

What do you think?

StevenMaude · 2023-09-19T12:30:36Z

Before going further, I'd like to know more about markdown-exec and mktestdocs.

A summary of the available packages for running or testing inline code

markdown-exec

moderately actively maintained (last commit 2023-06-14)
primarily for executing code to generate output in the document, rather than testing the code blocks; the code and the output are displayed adjacent to each other
there's no obvious way, outside of modifying the plugin source, to extend the plugin to customise running of code blocks; there's no easy way to run opensafely exec ehrql:v0 generate-dataset $PATH_TO_DATASET_DEFINITION
you can't, at least in an obvious way, get the output from some code block and use that elsewhere, further on in the text; it's included adjacent to the code block

codechecker-mkdocs

not actively maintained (last commit 2021-12-16)
like markdown-exec, no obvious way outside of modifying the plugin source to extend the plugin to customise running of code blocks

entangled/mkdocs-plugin

moderately actively maintained (last commit 2023-05-29)
primarily for running code to generate output in the document
can specify a series of build steps via a Makefile
no obvious framework to test examples separately from building the examples

mkdocs-code-validator

actively maintained (last commit 2023-09-18)
adds "validators" to code blocks of given syntax; the code block is piped as standard input to a provided command
it looks like the validators apply globally, so all code blocks have to adhere to that
I think a validator can be customised for any code block syntax labelling, so you could have, for use, a dataset_definition syntax in your code blocks
- this would be at the expense of then losing the Python syntax highlighting, although there may be a way of customising this to work
that wouldn't check that the output is as you expect, only that an entirely self-contained dataset definition — for example, no provided codelists — runs without error
this doesn't give an easy way of including any generated outputs

mktestdocs

moderately actively maintained (last commit 2023-04-27)
for running pytest against Markdown code blocks
you can define custom language support, so we could have a dataset_definition "language" in your code blocks
again, this doesn't give a way of including outputs of dataset definitions

pytest-examples

(This issue is now closed, but I found this package after closing the issue, so adding it for completeness.)

moderately actively maintained (last commit 2023-09-27)
for running pytest against Python in Markdown code blocks, and running Ruff/Black against those code blocks
this is probably the closest thing to what's needed…
…but it's not particularly extensible/configurable: the run function is hardcoded to run the Python code, you can't customise it to run some the code block via some other mechanism

pytest-codeblocks

(This issue is now closed, but I found this package after closing the issue, so adding it for completeness.)

moderately actively maintained (last commit 2023-09-17)
for running pytest against Python and shell code in Markdown code blocks
this is also close to what's needed…
…but, like pytest-examples, is also not very configurable

StevenMaude · 2023-09-19T12:39:30Z

What isn't covered by these plugins?

Let's, for now, assume that we only want to test working dataset definitions and ignore everything else that we could check — failing dataset definitions, full OpenSAFELY projects, sandbox sessions, errors.¹

After reviewing again what these plugins do, I struggle to see how they handle the following (suggestions are welcome):

Mechanisms for testing dataset definitions that are not entirely "self-contained". By "self-contained", I mean, the only thing needed to run the dataset definition successfully is a single Python dataset definition source file.
- Some dataset definitions may use multiple files and additional resources. Those might be codelists, as several dataset definitions in the ehrQL examples page do.
- We'd likely be restricted to the same example data each time; either the ehrQL example data, or randomly generated data. That might be OK. But perhaps we want to use specific example data to illustrate a point about ehrQL, and still validate the dataset definition and output?
- Some dataset definitions as included in the current source on main are incomplete, because it's a small quotation of a larger dataset definition. (In this PR, I've extended those examples to be minimally complete.)
Validation and flexible inclusion of the output from a dataset definition. If we want to include the current outputs from a dataset definition elsewhere in a page at an arbitrary position, then how do we generate and include these outputs? Maybe this is not a requirement, but if the outputs are manually copy-pasted in, then it's possible they (confusingly) diverge from what the dataset definition code produces.
Reuse of code blocks — there's no obvious mechanism for specifying "please insert some lines from the above code block here" that I'm aware of. Where there are multiple code blocks using part of the same dataset definition, a duplication of the code would be needed. Maybe there's another MkDocs plugin to handle this? This problem may also apply to reuse of examples across different pages.

Some of these considerations may apply to other approaches that we devise ourselves to include code inline.

I'd argue that checking some or all of these would also be useful, but that's an aside. ↩

StevenMaude · 2023-09-19T15:40:38Z

I'll respond to some of the other comments below.

StevenMaude · 2023-09-19T15:59:27Z

Checking dataset definition fragments

>>> from ehrql.tables.beta.core import patients

All we need to check, here, is that this line of code (a fragment of a dataset definition) is valid.

What's meant by "valid" here?

This import statement is valid according to Python's syntax rules and, if you have the ehrql package available, it is valid Python code too, running without error. But the line alone is not a valid dataset definition:

Did not find a variable called 'dataset' in dataset definition file

We can only check that line in the full ehrQL context by making it part of a minimally complete dataset definition.

There are also other quoted parts of dataset definitions (examples on this page) that, as presented in isolation, would be only valid Python syntax, but not even valid Python code — there is prior context missing from the code. Despite that, the lines could still form part of a perfectly valid dataset definition.

For these fragments, the options are then:

ignore checking of these dataset definition fragments, which might be acceptable, but may not be ideal
- some of the current examples are also incomplete, and would also require exempting
make each fragment complete so that they can be validated

If we make each fragment complete, then doing so inline is going to entail repetition. The repetition will also make the documentation less readable as we would include the entire dataset definition every time we refer to it.

To reuse the examples, we need something like the snippet approach in this PR that allows repeated inclusion and selective quoting. Or some other thing we devise where we can include the reused code inline in the source file.

StevenMaude · 2023-09-19T16:08:58Z

Formatting

I'd also suggest not trying to handle Ruff et al. I sometimes -- intentionally! -- want to write code that would be reformatted. For example, we talk about importing the patients table and the medications table in the ehrQL tutorial. It's reasonable to go from patients to medications.

I'm really not tied to using Ruff and Black. And there are other issues with Ruff and Black too:

Black introduces extra parentheses for long lines, which make the ehrQL syntax look more complicated (we could workaround this by increasing the line length)
autoformatting adds extra, unnecessary blank lines because of the snippet markers

On the other hand, one reason we might want to use these tools is that different authors may write dataset definitions in different ways. Because of that, we can end up with inconsistencies in how they are written across the documentation.

StevenMaude · 2023-09-19T16:48:54Z

Managing snippets and reuse

do we really need to separate out the code examples?

At the moment, possibly if we want to:

test example fragments
use any static analysis tools easily on the examples
eventually separate out anything that looks code-like and is checkable from the entire OpenSAFELY documentation (which would be my ideal)

If we can devise a better way, then no.

Separating out the code examples makes the source docs harder to reason about; locating them in a parallel directory tree, even more so.
do they really need to be located in a parallel directory tree?

No, they don't need to be separated.

They could be included in the same directory as the relevant page directly. Perhaps even without any directory structure at all: though throwing everything into the same directory feels like it might create problems in organising the code for testing.

I think separating out the code examples and locating them in a parallel directory tree has introduced a new class of error; that is, referencing error.

MkDocs will fail to build the site, if a snippet link is invalid.

There is the possibility of accidentally referring to an incorrect file, granted. (That could also happen even if we devise some inline solution that doesn't involve repetition of code, and instead uses references to the code to include.)

I can't easily tell from the PR whether there are one-to-many relationships between code examples and source docs; that is, whether one code example is used in many source docs.

My intent was to have one-to-one, although I think the actual PR as it stands does have a little reuse because one of the examples was identical.

There are tradeoffs with both:

one-to-one: repetition if you have the same example, and possible unintended divergence if you modify one without the other
one-to-many: you cannot modify one example without (possibly unintentiionally) modifying the other; you could workaround this by instead symbolic linking possibly to make the duplication clear

Naming things is hard in general; naming code examples seems especially hard.

Anecdotally, having named tens of these, I didn't find it that difficult, even artificially constraining myself to two word names, which isn't really required and was more for establishing a convention. The only naming headscratchers were for the examples page, where there are a few similar examples.

StevenMaude · 2023-09-19T17:19:19Z

Inlining code somehow

Rather than keeping the code snippets separately to the docs, causing the friction that Iain describes, can we find a way to keep the snippets inline in the docs, and then extract them and test them as part of the build process?
…
In terms of implementation, I wonder whether we could support adding some optional metadata to each code snippet, to indicate how the snippet should be tested.

We could then write a mkdocs plugin that implements the on_page_content hook. This would pull out the metadata from a snippet, and (a) test the snippet and (b) remove the metadata from the content.

What do you think?

In some ways, I think this is possibly preferable from the documentation author/editor perspective, that everything is in one file.

That said, if you implement some means of quoting extracts from a snippet, then you're possibly still jumping around in a text editor. But within the same file, not a separate file. So you'd still likely end up editing the documentation by having two tabs/windows open, one to refer to the code and one where you're writing: the difference is just that both would be opening the same file.

And from some of the other discussion above, I'd add the features that warrant consideration:

inclusion of partial quotations from an inlined file, so that we can reuse parts of a working dataset definition throughout a page
bundling of additional files like codelists or specific example data
generating outputs (datasets, logs) from dataset definitions, and inclusion of those outputs in the documentation Markdown
possibly reusing dataset definitions across different pages?
considering whether we ever want to test other code-related content in the OpenSAFELY documentation, and how extensible the approach is. A minimal OpenSAFELY project would require multiple files to run, not just a single dataset definition.

I have some ideas of how the above would work by using separate files, wherever they ended up being stored. If we went with an inline approach that we develop, we'd have to devise solutions for the features we actually want.

Some of this feels a little like we would be writing our own version of the existing snippet plugin.

On that note, I'll add that I just tried, and you can write a file where the snippet code to be used is in the same file. The problem is that the snippet itself, including any markers also get included in the output too. (These could be stripped out.)

StevenMaude · 2023-09-19T17:21:36Z

(thanks for the comments; I've had to think a lot more about this)

StevenMaude · 2023-09-20T10:32:18Z

We could then write a mkdocs plugin that implements the on_page_content hook. This would pull out the metadata from a snippet, and (a) test the snippet and (b) remove the metadata from the content.
This would pull out the metadata from a snippet, and (a) test the snippet and (b) remove the metadata from the content.

A couple more thoughts here:

If I understand the intent correctly, I think this would end up running the code every time you edit/save the page.

On my computer, running only the example dataset definition takes about 0.5 seconds in Python with ehrql installed directly, and about 1 second in Docker. Even loading ehrQL and doing nothing has a fixed time cost:
```
$ time python3 -c "import ehrql"

real	0m0.356s
…
$ time opensafely exec ehrql:v0
…
real	0m0.968s
```
If it does retest on every page edit, then this could add a noticeable delay to editing and previewing a page. Particularly more for pages with lots of dataset definitions on one page (like the examples). Unless there's some kind of caching mechanism implemented that only tests changed dataset definitions? Even then there'd still be a slow first page load.
What happens if the test fails? Should we just stop building the page, or warn that the example is broken? The former is disruptive to page preview, the latter may not be noticed in the MkDocs logs.

StevenMaude · 2023-09-20T10:35:25Z

How do other projects manage testable code?

MkDocs users

I'm looking at the list of known Material for MkDocs users.

Surprisingly, a lot of even these technical organisations using MkDocs don't, at least obviously, test the code in their documentation. (This is from quickly looking around at any GitHub workflows and build configurations.)

Ones I can find testing are listed below.

FastAPI

https://github.com/tiangolo/fastapi

/docs directory contains Markdown source of internationalised documentation versions
/docs_src directory contains Python code
/tests/test_tutorial contains tests of the docs_src examples

…and that's all I can find

I checked through all of the listed Material for MkDocs users, and only FastAPI was testing their examples.

(That is, when the examples are structured much like we are doing in the ehrQL documentation. There were a few cases where documentation authors had used mkdocstrings to pull in code-related content, and these docstrings might well be tested.)

Sphinx

There's a doctest plugin as used in Fabric.

This allow writing of inline code.

This is still a little restrictive though:

it's geared towards running Python code directly
it generates outputs, but the output is raw from Python; you can't present CSV data in a nice table directly

Talks on documentation example testing

Again there are surprisingly few resources on this. I skimmed through these to see what they advocated.

"Unit Test the Docs: Why You Should Test Your Code Examples", 2022

https://www.youtube.com/watch?v=yCqteMY3L-g

Goes for storing the code examples elsewhere and pulling those in
These examples are stored as part of the unit tests and pulled in from there.

This uses a snippet approach with a Node.js tool called Bluehawk, that looks like the snippet support in pymdown-extensions but has a few more features.
For example, you can tag a single line, instead of having to wrap a single line in start and end markers.
(I'm not advocating that we use Bluehawk.)

"Unit Testing Your Docs", 2020

https://www.youtube.com/watch?v=E9zod8-I-fs

Also goes for separating the code examples from the documentation,
though in the same directory as the documentation source.

StevenMaude · 2023-09-20T15:56:15Z

I tried using the pymdown-extensions snippet extension as a basis for an additional Markdown extension that removes inline snippets.

Rather than figure out how to set up an entirely new project that lets me install an extension, I forked the pymdown-extensions repository and modified that for now.

This could be separated if it was useful.

Why an extension?

on_page_content is where the HTML is generated and would make the code slightly more complicated because now you're working with HTML, instead of Markdown.

When trying to modify the Markdown, I found that someone in a MkDocs issue has requested a hook that runs after the pymdown-extensions snippets are included. Such a hook currently doesn't exist.

The suggested approach in that issue is to create an extension that runs after the snippet extension. This is controlled by a priority level, which is 32 in the standard snippets extension.

There are probably cases that this doesn't catch, but does work in this simple case. This doesn't do any testing or anything like that, just allows for inclusion of "inline snippet", without.

How it works

It checks for the snippet start and end section lines.

This means that every inline snippet needs a start and end marker.

What might be better?

I'm still not convinced that separate files for code inclusion is a worse solution than this inline handling of code.

Otherwise, it might be preferable to make a feature request in the pymdown-extensions repository for this feature. Snippets are included when the snippet source and destination are the same file. The only problem is the repeated inclusion of the snippet and the snippet syntax.

Code

pymdownx/removeinlinesnippets.py

"""
Snippet ---8<---.

pymdownx.remove_snippet
Inject snippets

MIT license.

Copyright (c) 2023 Steven Maude
Copyright (c) 2017 Isaac Muse <isaacmuse@gmail.com>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
documentation files (the "Software"), to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions
of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
"""
from markdown import Extension
from markdown.preprocessors import Preprocessor
import re


class SnippetMissingError(Exception):
    """Snippet missing exception."""


class RemoveInlineSnippetPreprocessor(Preprocessor):
    """Remove inline snippets from Markdown content."""

    RE_SNIPPET_SECTION = re.compile(
        r"""(?xi)
        ^(?P<pre>.*?)
        (?P<escape>;*)
        (?P<inline_marker>-{1,}8<-{1,}[ \t]+)
        (?P<section>\[[ \t]*(?P<type>start|end)[ \t]*:[ \t]*(?P<name>[a-z][-_0-9a-z]*)[ \t]*\])
        (?P<post>.*?)$
        """
    )

    def __init__(self, config, md):
        """Initialize."""

        self.encoding = config.get("encoding")
        self.tab_length = md.tab_length
        super(RemoveInlineSnippetPreprocessor, self).__init__()

    def remove_inline_snippets(self, lines):
        """Remove inline snippets from the lines."""

        open_sections = set()

        new_lines = []
        for l in lines:
            # Found a snippet section marker
            m = self.RE_SNIPPET_SECTION.match(l)

            if m is not None:
                section_name = m.group("name")
                # We found the start
                if m.group("type") == "start":
                    open_sections.add(section_name)
                    continue

                # We found the end
                elif m.group("type") == "end":
                    try:
                        open_sections.remove(section_name)
                    except KeyError:
                        pass
                    continue

            # We are currently not in a snippet, so append the line
            if len(open_sections) == 0:
                new_lines.append(l)

        # It's possible that someone starts a section without closing it.
        # Fail loudly for now.
        # (It's also possible that someone ends a section without starting it.
        # This isn't explicitly handled.)
        assert len(open_sections) == 0
        return new_lines

    def run(self, lines):
        """Remove snippets."""

        return self.remove_inline_snippets(lines)


class RemoveInlineSnippetExtension(Extension):
    """Remove inline snippet extension."""

    def __init__(self, *args, **kwargs):
        """Initialize."""

        self.config = {
            "encoding": ["utf-8", 'Encoding of snippets - Default: "utf-8"'],
        }

        super(RemoveInlineSnippetExtension, self).__init__(*args, **kwargs)

    def extendMarkdown(self, md):
        """Register the extension."""

        self.md = md
        md.registerExtension(self)
        config = self.getConfigs()
        remove_snippet = RemoveInlineSnippetPreprocessor(config, md)
        md.preprocessors.register(remove_snippet, "remove_snippet", 33)


def makeExtension(*args, **kwargs):
    """Return extension."""

    return RemoveInlineSnippetExtension(*args, **kwargs)

mkdocs.yml in the ehrql repository:

markdown_extensions:
 …
  - pymdownx.removeinlinesnippets

Example:

docs/tutorial/writing-a-dataset-definition/index.md

…
--8<-- "tutorial/writing-a-dataset-definition/index.md:func"
…
--8<-- [start:func]
def my_function(var):
    pass
--8<-- [end:func]

evansd · 2023-09-21T08:47:19Z

I wonder if this needs to be a mkdocs plugin at all. We could have a test which extracts all the snippets (together with their metadata) from all the .md files in the docs directory and does whatever testing we want to do on them.

That would just get run as part of our normal CI workflow.

I think this may have been what @iaindillingham was suggesting yesterday, and I just hadn't fully understood at the time.

StevenMaude · 2023-09-21T10:11:25Z

I wonder if this needs to be a mkdocs plugin at all.

Unless I'm missing something, you'd presumably still need some mechanism — MkDocs hook/plugin, Markdown extension or whatever else — somewhere to remove any embedded content from the Markdown that you don't want appearing in the rendered HTML.

evansd · 2023-09-21T10:22:08Z

HTML comments?

<!-- snippet: foo=bar -->
```python
from ehrql import create_dataset
```

evansd · 2023-09-21T10:29:27Z

Something like that mechanism would also allow us to check code examples drawn from docstrings that get included in the reference documentation.

iaindillingham · 2023-09-21T10:42:37Z

I haven't had an opportunity to do more than skim the comments, so sorry if this has been discussed. However, what about:

Iterate over each Markdown file
Create a corresponding Python file, in a corresponding directory tree, rooted in the system temporary directory
Extract everything in python fences from the Markdown file into the Python file
Pass each Python file to the interpreter

If the Python-in-Python were on the same line as the Python-in-Markdown, then an error would be easy to identify from the stack trace. (We may even want to catch and re-raise an error, modifying the filename from py to md.)

The above would give Python code in each Markdown file the same scope (a Markdown file is like a notebook). We could modify that slightly by adding the ehrql fence (aliased to the python fence), but extracting each ehrql fence to its own Python file (also in the corresponding directory tree, also named after the corresponding Markdown file, with a suffix, such as an integer index).

Indeed, rather than passing each Python file to the interpreter, why not pass it to ehrql.main.generate_dataset?

At this stage, I don't think we need a MkDocs plugin.

iaindillingham · 2023-09-21T10:48:46Z

I wonder if this needs to be a mkdocs plugin at all. We could have a test which extracts all the snippets (together with their metadata) from all the .md files in the docs directory and does whatever testing we want to do on them.

Sorry, just seen this @evansd. I agree. Hopefully my previous comment adds some necessary detail.

evansd · 2023-09-21T10:56:30Z

Thanks Iain. Broadly that sounds like the right approach to me, but I don't think there's any need to extract the code into files. Building the code up as a string and passing to eval would do the same thing I think. Or piping the code to a python subprocess if it turns out we need isolation.

Using a custom ehrql fence is a nice idea which hadn't occurred to me.

StevenMaude · 2023-09-21T12:29:52Z

Let's say that we do restrict ourselves to only wanting to test dataset definitions¹.

At risk of repeating myself², I'll summarise the main features of our current documentation code collection that I think makes some of them trickier to handle. It might narrow down what specifically the requirements of a good solution might be.

Quoting of dataset definitions

In some examples, we use short quotations from a dataset definition when discussing a specific part. In a few of these cases — the import statements — the quotation would at least be valid standalone Python, but not a valid dataset definition. In other cases, the lines as quoted are not even valid standalone Python.

It would be best for these lines would be taken from one complete dataset definition, which is tested. The way this kind of line selection is usually done is via some kind of extended markup feature like the pymdown-extensions snippet syntax. As is, the snippet function does not work correctly with inlined snippets³.

Less good alternatives:

copy-paste the required lines without testing them (bad, because they are duplicated, may get out of sync with the dataset definition, and not parts of tested code)
insert the full dataset definition each time and highlight the lines in question (bad, it leads to in-text redundancy and means the reader sees the full dataset definition each time; highlighting is brittle because you select line numbers, rather than a marked code section)
insert the full dataset definition once and link back up to it each time you refer to it (bad, because of reader confusion)

Incomplete dataset definitions

Relatedly, some dataset definitions in the examples are entirely incomplete. If we wanted to test these, we would need to make them complete and selectively quote the parts we want to include.

The less good alternative: just ignore these.

Failing dataset definitions

Some dataset definitions deliberately fail, and we include their tracebacks. It would be good to validate that these fail in the way that we've indicated in the text, with the correct error text. It's more than possible that we might change these errors in future.

The less good alternative: just ignore these, and update the failure errors manually.

Provision of supplementary files

Some of our included dataset definitions are not self-contained. They require supplementary files, such as codelists.

If we don't provide this additional data somewhere, then the options look like:

provide mock data or monkeypatch out calls to external resources, like codelist_from_csv, within a testing framework for these examples
devise a way to provide this data in another file, or include in the Markdown source bulking out the source document
don't test these kinds of dataset definitions at all

Inclusion and validation of outputs

Do we ever want to include the generated outputs from dataset definitions, such as output datasets or logs, in the documentation? Or validate those outputs don't change inadvertently by someone editing an example?

Less good alternative: we have to generate, copy-paste, and keep them in sync manually.

Making a decision

@evansd @iaindillingham: Let's say that we go with the "mark up inline dataset definition code blocks as ehrql, and pull them out for testing" idea you've proposed. I'll add: I'm happy to go try this out and compare. That is: I'll have a go with it.

I can imagine how pulling out the examples into separate files does address all of the above:

Quoting of dataset definitions — works as intended, because the snippets are in separate files
Failing dataset definitions — we save the traceback to a file and write code to generate the traceback for the current version of a dataset definition, then compare
Provision of supplementary files — we can provide the files directly
Inclusion and validation of outputs — we save the outputs, can include these via the snippet syntax in the Markdown source, and can check the current outputs against what we have stored

How does the inline ehrql code block approach address them? Or are we otherwise OK with ignoring these issues?

An assumption that I'd question: again, the ideal, for me, would ultimately be testing everything that's code or command-like in the entire OpenSAFELY documentation. This is why I think making something specifically work with only dataset definitions in mind might be trickier to generalise. It also feels important that the sandbox sessions in the tutorial are always kept up-to-date too. ↩
This PR is big and the thread is also long, and GitHub does not help by hiding comments. ↩
The pymdown-extensions snippet handling assumes you're pulling in a snippet from another file. It works within the same file, but you get the snippet source also included as part of the Markdown too. It can be hacked to work within the same file to remove the markers and content. We could possibly request the feature or propose a PR if we can devise one. Or we need to devise our own separate solution, if we want to reuse parts of dataset definitions. ↩

iaindillingham · 2023-09-21T16:02:09Z

Quoting of dataset definitions
These aren't dataset definitions, are they? They're snippets of Python.

Incomplete dataset definitions
Ditto.

Failing dataset definitions
Let's not test these; let's instead focus our effort on improving the tracebacks. For example, we could include a link in the traceback that points the user to a page that describes, without code, the problem and the solution.

Provision of supplementary files
Let's deal with this later.

Inclusion and validation of outputs
Let's deal with this later.

If I could emphasize one thing, it would be that the ehrQL docs will change, and will change dramatically. Let's aim for minimally better: Python code that doesn't error when passed to the interpreter.

StevenMaude · 2023-10-10T16:33:10Z

I'm working on the alternative approach proposed here in #1648.

StevenMaude · 2023-10-24T10:59:40Z

Closing in favour of #1648 which tests dataset definitions only, but without restructuring the examples.

This code has been subject to considerable work to get it into this form. However, it did not seem useful to retain the various approaches and versions of the code before this state. A quick guide to this code: * It finds any Markdown files in `docs/`. * It uses the SuperFences extension, as we do in the MkDocs configuration, to extract Markdown code blocks labelled with `ehrql` syntax. These are assumed to be self-contained dataset definitions. * The code blocks that will be tested should appear as code blocks in the documentation, by default (provided the CSS isn't changed to modify the appearance of code blocks somehow, which shouldn't be the case, because why would you?). They are identified in the parametrized tests by their ordinal fence number in the source file. * It finds any Python modules indicated by a `.py` extension. Python modules are assumed to be self-contained dataset definitions. * The found dataset definitions are run to generate a dataset, and the output checked to see if it's a CSV. There is some monkeypatching necessary to make this work: * `codelist_from_csv()` relies on having CSV data available, and the checks on valid codelist codes are patched out. Without further work, we don't have any direct way of including data for inline dataset definitions in Markdown source, or specifying which mock CSV data to use without any established convention for examples to use. #1697 proposes ideas to remove this monkeypatching further. * The sandboxing code is monkeypatched out to use "unsafe" loading of dataset definitions. Without doing so, it is not possible to monkeypatch any other ehrQL code: the ehrQL is run in a subprocess otherwise. For more details and discussion, see the related PR for this code (#1648) and the previous PR (#1475) which this approach replaces.

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from 7410821 to 70e2489 Compare August 3, 2023 16:12

github-actions bot deployed to databuilder-docs (Preview) August 3, 2023 16:13 View deployment

github-actions bot deployed to databuilder-docs (Preview) August 3, 2023 16:27 View deployment

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from 2d7c360 to 5688bb5 Compare August 3, 2023 16:27

github-actions bot deployed to databuilder-docs (Preview) August 3, 2023 16:28 View deployment

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from 5688bb5 to 226679e Compare August 3, 2023 16:29

github-actions bot deployed to databuilder-docs (Preview) August 3, 2023 16:30 View deployment

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from 226679e to 0d90141 Compare August 3, 2023 16:31

github-actions bot deployed to databuilder-docs (Preview) August 3, 2023 16:32 View deployment

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from 0d90141 to 0e52913 Compare August 8, 2023 11:54

github-actions bot deployed to databuilder-docs (Preview) August 8, 2023 11:55 View deployment

github-actions bot deployed to databuilder-docs (Preview) August 8, 2023 13:58 View deployment

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch 3 times, most recently from 05f4237 to 42bed95 Compare August 9, 2023 15:34

github-actions bot deployed to databuilder-docs (Preview) August 9, 2023 15:36 View deployment

github-actions bot deployed to databuilder-docs (Preview) August 9, 2023 18:03 View deployment

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from 990cd38 to 9694cc4 Compare August 9, 2023 18:09

github-actions bot deployed to databuilder-docs (Preview) August 9, 2023 18:10 View deployment

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from 9694cc4 to d725046 Compare September 5, 2023 16:18

github-actions bot requested a deployment to databuilder-docs (Preview) September 5, 2023 16:19 Abandoned

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from d725046 to d8ba8f0 Compare September 5, 2023 16:45

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from d8ba8f0 to 1d55c4d Compare September 6, 2023 12:26

github-actions bot deployed to databuilder-docs (Preview) September 6, 2023 12:27 View deployment

StevenMaude force-pushed the steve/separate-code-examples-from-docs branch from 1d55c4d to c87c1fc Compare September 6, 2023 15:54

github-actions bot deployed to databuilder-docs (Preview) September 6, 2023 15:56 View deployment

github-actions bot deployed to databuilder-docs (Preview) September 6, 2023 17:17 View deployment

StevenMaude mentioned this pull request Oct 10, 2023

test: Check complete ehrQL dataset definitions in docs #1648

Merged

StevenMaude closed this Oct 24, 2023

StevenMaude deleted the steve/separate-code-examples-from-docs branch November 23, 2023 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate code examples from docs #1475

Separate code examples from docs #1475

StevenMaude commented Aug 3, 2023 •

edited

cloudflare-pages bot commented Aug 3, 2023 •

edited

StevenMaude commented Sep 6, 2023 •

edited

inglesp commented Sep 19, 2023

StevenMaude commented Sep 19, 2023 •

edited

StevenMaude commented Sep 19, 2023 •

edited

StevenMaude commented Sep 19, 2023

StevenMaude commented Sep 19, 2023

StevenMaude commented Sep 19, 2023

StevenMaude commented Sep 19, 2023

StevenMaude commented Sep 19, 2023 •

edited

StevenMaude commented Sep 19, 2023

StevenMaude commented Sep 20, 2023

StevenMaude commented Sep 20, 2023 •

edited

StevenMaude commented Sep 20, 2023 •

edited

evansd commented Sep 21, 2023

StevenMaude commented Sep 21, 2023 •

edited

evansd commented Sep 21, 2023

evansd commented Sep 21, 2023

iaindillingham commented Sep 21, 2023

iaindillingham commented Sep 21, 2023

evansd commented Sep 21, 2023

StevenMaude commented Sep 21, 2023 •

edited

iaindillingham commented Sep 21, 2023

StevenMaude commented Oct 10, 2023

StevenMaude commented Oct 24, 2023

Separate code examples from docs #1475

Separate code examples from docs #1475

Conversation

StevenMaude commented Aug 3, 2023 • edited

Reasons for this PR

What has been separated out

Summary of the changes

How the code examples are structured

Questions for review

Footnotes

cloudflare-pages bot commented Aug 3, 2023 • edited

Deploying with Cloudflare Pages

StevenMaude commented Sep 6, 2023 • edited

Duplication/reuse

Generating outputs

Further quality control

Miscellaneous

inglesp commented Sep 19, 2023

StevenMaude commented Sep 19, 2023 • edited

A summary of the available packages for running or testing inline code

markdown-exec

codechecker-mkdocs

entangled/mkdocs-plugin

mkdocs-code-validator

mktestdocs

pytest-examples

pytest-codeblocks

StevenMaude commented Sep 19, 2023 • edited

What isn't covered by these plugins?

Footnotes

StevenMaude commented Sep 19, 2023

StevenMaude commented Sep 19, 2023

Checking dataset definition fragments

StevenMaude commented Sep 19, 2023

Formatting

StevenMaude commented Sep 19, 2023

Managing snippets and reuse

StevenMaude commented Sep 19, 2023 • edited

Inlining code somehow

StevenMaude commented Sep 19, 2023

StevenMaude commented Sep 20, 2023

StevenMaude commented Sep 20, 2023 • edited

How do other projects manage testable code?

MkDocs users

FastAPI

…and that's all I can find

Sphinx

Talks on documentation example testing

"Unit Test the Docs: Why You Should Test Your Code Examples", 2022

"Unit Testing Your Docs", 2020

StevenMaude commented Sep 20, 2023 • edited

Why an extension?

How it works

What might be better?

Code

evansd commented Sep 21, 2023

StevenMaude commented Sep 21, 2023 • edited

evansd commented Sep 21, 2023

evansd commented Sep 21, 2023

iaindillingham commented Sep 21, 2023

iaindillingham commented Sep 21, 2023

evansd commented Sep 21, 2023

StevenMaude commented Sep 21, 2023 • edited

Quoting of dataset definitions

Incomplete dataset definitions

Failing dataset definitions

Provision of supplementary files

Inclusion and validation of outputs

Making a decision

Footnotes

iaindillingham commented Sep 21, 2023

StevenMaude commented Oct 10, 2023

StevenMaude commented Oct 24, 2023

StevenMaude commented Aug 3, 2023 •

edited

cloudflare-pages bot commented Aug 3, 2023 •

edited

StevenMaude commented Sep 6, 2023 •

edited

StevenMaude commented Sep 19, 2023 •

edited

StevenMaude commented Sep 19, 2023 •

edited

StevenMaude commented Sep 19, 2023 •

edited

StevenMaude commented Sep 20, 2023 •

edited

StevenMaude commented Sep 20, 2023 •

edited

StevenMaude commented Sep 21, 2023 •

edited

StevenMaude commented Sep 21, 2023 •

edited