Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize nltk.org/howto pages #2856

Merged
merged 3 commits into from Oct 18, 2021

Conversation

tomaarsen
Copy link
Member

Hello!

Pull request overview

  • Modernize nltk.org/howto and nltk.org/howto/*.html.
  • Automate the updating of nltk.org/howto pages. They're now built whenever the rest of the website is.

Live demo

See https://tomaarsen.com/nltk/howto.html for the result of this PR in action. Feel free to compare to https://www.nltk.org/howto. Keep in mind that the HOWTO's from nltk.org are ~6 years old, so they might have slightly less or different information.

Background

These HOWTO pages (see https://www.nltk.org/howto) are generated using the nltk/test/*.doctest files. They provide a quick and comfortable way of showing how certain core functionality can be used.

This PR tackles one of the last thorns in my eye regarding the website. The page formerly referred to as the HOWTO page (later renamed to Example Usage) looks very bland, and doesn't fit with the remainder of the website. It feels very separate to the rest of the site.
This PR tackles this in a simple and concise way, which allows for automated updating of the example usage files. This is a step up from the current method, which requires manual updates - something that hasn't been done for some 6 years (because it's so annoying to update!)

Details

I realised that our website builder Sphinx is able to render these pages properly if we simply use

.. include:: /nltk/test/bleu.doctest

The next step was figuring out how to easily and properly use this. A key goal was to ensure that no page URLs need to be updated. If https://www.nltk.org/howto/corpus.html links to the Example Usage of the corpus module right now, then it will still do so after this PR is merged. This means: No unnecessary updating of URLs throughout the projects, and no links to documentation used in e.g. stackoverflow that suddenly don't go anywhere anymore.

To help with this, I've made a function in web/conf.py, which generates these howto files. They're based on jinja templates like we were already using for the automatic generation of the API Reference. The new template looks like this:

{% for item in range(17 + module_name|length) -%}#{%- endfor %}
Sample usage for {{ module_name }}
{% for item in range(17 + module_name|length) -%}#{%- endfor %}

.. include:: ../../nltk/test/{{ module_name }}.doctest

Which generates for example:

####################
Sample usage for ccg
####################

.. include:: ../../nltk/test/ccg.doctest

From this point onwards there only needed to be an index page for nltk.org/howto to replace the current one, and the Table of Contents needed to point to the new howto page. Simple.

Screenshots

Click here to see some examples of Before and After

Before

image

After

image

Before

image

After

image

Before

image

After

image

Future changes

Perhaps there are some cases where a code block spans too wide, and a single line is placed on separate lines. In most situations this is the preferred solution, but sometimes we want to be able to see e.g. a pretty-printed drawing fully. I'm not quite sure how to go about fixing this.

Note

I haven't had time to compare and check every single HOWTO page, and it's important to note that there are now a whopping 543 warnings (as opposed to e.g. 512 before). In some of these cases, the website will render the page oddly, or remove parts. We'll probably want to tackle these at some point.


It seems that the common usage of the website will be fully consistent and modernised after this PR!

  • Tom Aarsen

@stevenbird
Copy link
Member

@tomaarsen this is a major improvement, after all these years! Thanks for setting up a demo site to make it easy to review this.

I'm glad you mentioned wrapping as it's an issue for readability in a few places that make heavy use of ASCII art:

I'm surprised that it's not an issue in more places.

I wonder if there's a way to turn off this line wrapping, or set the length to a higher value?

@tomaarsen
Copy link
Member Author

tomaarsen commented Oct 15, 2021

I wonder if there's a way to turn off this line wrapping, or set the length to a higher value?

Yes, there is. I have full control over the site's CSS through nltk_theme, so I can patch this in.
Currently, the code blocks do not allow for horizontal scrolling, but I can enable this. With horizontal scrolling, this issue ought to disappear.

I'll see if this introduces some other undesired behaviour (i.e. long outputs that should really just be on multiple lines, like long lists).

(Edit: Oops, I wrote vertical instead of horizontal scrolling)

@iliakur
Copy link
Contributor

iliakur commented Oct 15, 2021

Hmm, "sample usage" may be misleading for modules where the doctest files only contain regression tests. I'm sorry I can't think of a way to address this, so it may seem like I'm raining on the parade.

@tomaarsen
Copy link
Member Author

We can always revert to what we called it: "HOWTO", but I don't think that's fitting either.

@iliakur
Copy link
Contributor

iliakur commented Oct 15, 2021

I completely agree that "howto" is also misleading. I think the underlying problem is that we don't have a standard for what we put into these doctest files. To me it makes sense to tackle that question first, then settle on some automation approach here.

If you'd like to proceed with the current naming convention (which on its own is totally fine BTW!) without solving the underlying problem, I don't see an automatic way to decide which doctest files to include.

@tomaarsen
Copy link
Member Author

tomaarsen commented Oct 15, 2021

@stevenbird I've enabled horizontal scrolling on https://www.tomaarsen.com/nltk_2. The previous website is still accessible under https://www.tomaarsen.com/nltk. https://www.tomaarsen.com/nltk has horizontal scrolling.

The updated page for your examples: CCG and Featgram

Beyond that, some other places are modified, where it wasn't strictly necessary:
https://tomaarsen.com/nltk/api/nltk.tokenize.stanford.html
In these examples, users might have to scroll where this wasn't necessary previously.


@iliakur I feel like that's a problem for another time. We had this issue 8 years ago too, e.g. with https://www.nltk.org/howto/classify.html, and I don't think that it should stop us from proceeding with this PR at this point.
That said, I don't want to just leave the issue be, either. Perhaps we can discuss a solution somewhere. For example, moving all regression tests into unit tests. Or (even better), moving to the following file structure:

├── unit
│       ├── test_....py
│       └── ...
├── regression
│       ├── ....doctest
│       └── ...
└── documentation
        ├── ....doctest
        └── ...

The unit folder stays the same. After that, we take each doctest, and move the code that shows how the module can be used into the documentation folder, while we move all of the regression tests into the regression folder. Any module can then have a unit/test_module.py file for mass pytest tests, regression/module.doctest for regression tests that benefit from being able to write documentation around the code, and documentation/module.doctest for showing how functionality can be used.

That way, we still have the tests, and we have a place where we can explicitly write documentation for the website, while making this change should not require too much effort. Perhaps we can just make an issue for the discussion.

@stevenbird
Copy link
Member

stevenbird commented Oct 17, 2021

I've enabled horizontal scrolling

That's a big improvement, thanks.

users might have to scroll where this wasn't necessary previously

Is this because lines that had been manually wrapped are no longer?

@tomaarsen
Copy link
Member Author

tomaarsen commented Oct 17, 2021

Is this because lines that had been manually wrapped are no longer?

No, manually wrapped lines are still wrapped. The regression is very small and subjective. An example would be:

In this example I would prefer the old one.

That said, I think the system with the horizontal scrolling is much preferable in most situations. I'll stick with that one. I'll publish that to https://github.com/tomaarsen/nltk_theme soon, so it's automatically included in the next website update. (As long as the website builder runs pip install -U nltk_theme first.)

@tomaarsen
Copy link
Member Author

The new version 2.0.5 has been published for https://github.com/tomaarsen/nltk_theme. This includes the horizontal scrolling. With other words, the next build should have this horizontal scrolling enabled.

I'm not planning any further changes for this PR.

@stevenbird
Copy link
Member

@tomaarsen I agree that there are cases where the automatic wrapping produces a more readable result. I guess this is all cases which did not involve ASCII art. A remedy is for someone to wrap them manually I suppose... I'll open an issue for this so we don't lose track of it.

@stevenbird stevenbird merged commit bec8910 into nltk:develop Oct 18, 2021
stevenbird added a commit to nltk/nltk.github.com that referenced this pull request Oct 18, 2021
@stevenbird
Copy link
Member

@tomaarsen solid work!!

@tomaarsen tomaarsen deleted the feature/website-howto branch October 18, 2021 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants