New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modernize nltk.org/howto
pages
#2856
Conversation
@tomaarsen this is a major improvement, after all these years! Thanks for setting up a demo site to make it easy to review this. I'm glad you mentioned wrapping as it's an issue for readability in a few places that make heavy use of ASCII art: I'm surprised that it's not an issue in more places. I wonder if there's a way to turn off this line wrapping, or set the length to a higher value? |
Yes, there is. I have full control over the site's CSS through I'll see if this introduces some other undesired behaviour (i.e. long outputs that should really just be on multiple lines, like long lists). (Edit: Oops, I wrote vertical instead of horizontal scrolling) |
Hmm, "sample usage" may be misleading for modules where the doctest files only contain regression tests. I'm sorry I can't think of a way to address this, so it may seem like I'm raining on the parade. |
We can always revert to what we called it: "HOWTO", but I don't think that's fitting either. |
I completely agree that "howto" is also misleading. I think the underlying problem is that we don't have a standard for what we put into these If you'd like to proceed with the current naming convention (which on its own is totally fine BTW!) without solving the underlying problem, I don't see an automatic way to decide which doctest files to include. |
@stevenbird The updated page for your examples: CCG and Featgram Beyond that, some other places are modified, where it wasn't strictly necessary: @iliakur I feel like that's a problem for another time. We had this issue 8 years ago too, e.g. with https://www.nltk.org/howto/classify.html, and I don't think that it should stop us from proceeding with this PR at this point.
The That way, we still have the tests, and we have a place where we can explicitly write documentation for the website, while making this change should not require too much effort. Perhaps we can just make an issue for the discussion. |
That's a big improvement, thanks.
Is this because lines that had been manually wrapped are no longer? |
No, manually wrapped lines are still wrapped. The regression is very small and subjective. An example would be: In this example I would prefer the old one. That said, I think the system with the horizontal scrolling is much preferable in most situations. I'll stick with that one. I'll publish that to https://github.com/tomaarsen/nltk_theme soon, so it's automatically included in the next website update. (As long as the website builder runs |
The new version 2.0.5 has been published for https://github.com/tomaarsen/nltk_theme. This includes the horizontal scrolling. With other words, the next build should have this horizontal scrolling enabled. I'm not planning any further changes for this PR. |
@tomaarsen I agree that there are cases where the automatic wrapping produces a more readable result. I guess this is all cases which did not involve ASCII art. A remedy is for someone to wrap them manually I suppose... I'll open an issue for this so we don't lose track of it. |
@tomaarsen solid work!! |
Hello!
Pull request overview
nltk.org/howto
andnltk.org/howto/*.html
.nltk.org/howto
pages. They're now built whenever the rest of the website is.Live demo
See https://tomaarsen.com/nltk/howto.html for the result of this PR in action. Feel free to compare to https://www.nltk.org/howto. Keep in mind that the HOWTO's from nltk.org are ~6 years old, so they might have slightly less or different information.
Background
These HOWTO pages (see https://www.nltk.org/howto) are generated using the
nltk/test/*.doctest
files. They provide a quick and comfortable way of showing how certain core functionality can be used.This PR tackles one of the last thorns in my eye regarding the website. The page formerly referred to as the
HOWTO
page (later renamed toExample Usage
) looks very bland, and doesn't fit with the remainder of the website. It feels very separate to the rest of the site.This PR tackles this in a simple and concise way, which allows for automated updating of the example usage files. This is a step up from the current method, which requires manual updates - something that hasn't been done for some 6 years (because it's so annoying to update!)
Details
I realised that our website builder Sphinx is able to render these pages properly if we simply use
The next step was figuring out how to easily and properly use this. A key goal was to ensure that no page URLs need to be updated. If https://www.nltk.org/howto/corpus.html links to the Example Usage of the corpus module right now, then it will still do so after this PR is merged. This means: No unnecessary updating of URLs throughout the projects, and no links to documentation used in e.g. stackoverflow that suddenly don't go anywhere anymore.
To help with this, I've made a function in
web/conf.py
, which generates these howto files. They're based on jinja templates like we were already using for the automatic generation of the API Reference. The new template looks like this:Which generates for example:
From this point onwards there only needed to be an index page for
nltk.org/howto
to replace the current one, and the Table of Contents needed to point to the new howto page. Simple.Screenshots
Click here to see some examples of Before and After
Before
After
Before
After
Before
After
Future changes
Perhaps there are some cases where a code block spans too wide, and a single line is placed on separate lines. In most situations this is the preferred solution, but sometimes we want to be able to see e.g. a pretty-printed drawing fully. I'm not quite sure how to go about fixing this.
Note
I haven't had time to compare and check every single HOWTO page, and it's important to note that there are now a whopping 543 warnings (as opposed to e.g. 512 before). In some of these cases, the website will render the page oddly, or remove parts. We'll probably want to tackle these at some point.
It seems that the common usage of the website will be fully consistent and modernised after this PR!