Skip to content

Developers' Guide

Barry Pollard edited this page Oct 10, 2021 · 31 revisions

Developers are responsible for the technical infrastructure of the website and solve complex problems about accessibility, performance, internationalization, SEO, and more.

We aim for this site to be as available and inclusive as possible and follow all best practices for creating a website – especially as we are calling out the usage of such best practices on the web!

As such the site needs to be built with high quality code (including code reuse rather than duplication, readability and maintainability) and be performant (many of use involved in the Almanac are web performance evangelists!).

Inclusion and accessibility is very important to us and we publish a comprehensive Accessibility Statement to abide by and keep up to date. On a similar note internationalisation is very important. At the time of writing the Web Almanac is available in English and 12 additional languages and counting! The tech stack is built with multi-lingual and multi-year support but additional translations may require additional development to support language-specific features or other localisation needs.

SEO is an important consideration for any site and we have spent a lot of time optimising the site for SEO so must consider that too.

Table of contents

Commitment summary

  • Platform development: Developers will work together to build the front and back ends of the Almanac website, reusing some components from previous editions and building new ones. This is an ongoing commitment with varying levels of work at any given time from June to November.

The amount of time you put in is up to you, but developers typically give about 10 hours over 6 months.

How to join and contribute

Leave a comment in the Call for Developers issue.

Browse issues with the Development label.

Casual contributors should fork the repo into their own GitHub and submit pull requests. Frequent developers can work from the main repo in separate branches, which may be easier for collaboration. Only core maintainers (currently Rick and Barry) are able to merge PRs into the main and production branches and everyone must submit PRs to main for them to be reviewed and merged.

For 2020 we moved our main branch to main so those involved in 2019 with their own fork may need to follow the steps in issue #880 to migrate their own fork.

Developers should assign Issues to themselves so other developers are aware they are being worked on, and release them if they are unable to continue on that Issue for whatever issue. Comment frequently, and reach out for help! We also label Issues to make them easier to find. The good first issue label is a great place to start for new developers!

Developers should also help review other developers code to help the core maintainers, improve the code quality of merged code, and familiarise themselves with changes to the code.

Tech Stack

The Web Almanac website is built using vanilla CSS and JS, hosted on Google Cloud Platform, through a Python-based Flask application server, serving Jinja2 templates. Wow that's a lot of technical terms!

The current tech stack is essentially split into the following pieces:

  • NodeJS Scripts
  • Jinja2/EJS Templates
  • Python code
  • Static files
  • JSON Config

These are discussed in more detail below and are all available in the src directory of this repo.

This tech stack has served us pretty well, but we are always up for changing this if there are good reasons to and general consensus too. There's probably a bit too much technology in there, some overlap between EJS and Jinja2, possible too many layers of hierarchy for the Jinja2 templates and we've even discussed in the past whether it should just be a static site! But that's what we have for now. Raise an issue if you want to discuss the tech stack further, but suggest you familiarise yourself with it first to see advantages and disadvantages of it.

NodeJS Scripts

The scripts in the src/tools/generate folder are Node/JavaScript files which are used to convert the chapters (written in Markdown in the content folder), into Jinja2/HTML templates. They use some EJS templates (very like Jinaj2 in functionality but with slight syntax differences). This is run automatically on commit to main using a GitHub Action to test any changes are good to merge, but can also be run manually to build and run the site locally. Run npm install from src directory to be able to run the npm run generate command.

There are also further commands (npm run ebooks) to generate the ebooks using princexml but that only needs to be done on release (usually from a GitHub Action) so core maintainers can take care of that mostly.

Jinja2/EJS Templates

The Jinja2 and EJS templates in the templates folder allow us to avoid duplicating code for all the pages, and support multiple languages and years. The templates in the base folders are the majority of the HTML, and the language-specific templates mostly just contain translations of various phrases and paragraphs. The individual chapter's HTML should never need be edited directly as it is overwritten with the npm run generate process described above (and so there’s generated files are not committed to git).

The templates follow a bit of layering, which can be quite confusing at first. Take for example the CSS chapter in English for 2019. It is made of the following files in the src directory:

As you can see it's a little convoluted, but does allow support of multiple years and languages without repeating a lot of code. Usually it's easy to figure out what page to edit from a quick search of the code.

The non-chapters (e.g. the Home Page , Methodology , Table of Contents ...etc. pages) follow a similar route but these do not have their content written in Markdown and are written directly in HTML/Jinja2 as they are normally written by the development team who have the skills and ability to write directly in those, whereas we want chapter authors to concentrate on the content.

Python code

Python scripts in the main src/server folder are the webserver code, including mapping of URL routes, to templates and various functions made available to the Jinja2 templates. You need to install the dependencies as detailed in the src/README.md file and then run the webserver with python main.py so you can browse the site locally at http://127.0.0.1:8080/ using the built-in Werkzeug development server.

The Python files are launched by the src/main.py file.

We have 100% code coverage of the python in pytest unit tests and are looking to keep it that way! Each pull request runs these tests and also checks we’re still at 100% code coverage.

Static files

The static folder contains CSS, JS, Images, Fonts and other "static" files, that can be served directly by Google Cloud Platform without going through Python application server. Developers will mostly be editing the CSS and JS files. Some CSS and JS are inlined into the templates, but we try to keep that to a minimum and for code that is shared across many pages (e.g. the core CSS, or the chapter JS) they are separated out into static files to allow caching reuse.

JSON Config

The config folder contains a JSON config per year. This allows common config to avoid being hardcoded and using JSON allows the config to be share between the node/JS generate scripts and also the Python/Jinja2 site.

SQL folder

Note the top-level sql folder is a collection of the HTTP Archive SQL queries used to get the stats. It is not used on the site (though there is a link to this from chapters to explore). So mostly developers can ignore this folder and leave it for the https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide to manage. They will also create the figure images for each chapter.

Automated Testing

As the 2020 Web Almanac was built heavily around the hard work used to create the site for the inaugural 2019 edition, we spent a lot of that year adding automated testing call for every pull requests, including:

  • Linting all files, with specific rules for each code language.
  • Building the entire site.
  • Running pytest unit tests for Python code with 100% code coverage.
  • Testing every single page can generate correct (200 response).
  • Testing error pages and redirects.
  • Linting generated HTML
  • Running Lighthouse on every page changed (and on every page weekly on Sunday on the production site) to ensure 100 scores in Accessibility, SEO, and Best Practices, as well as best practices for Performance (pretty close to 100!).
  • Running security tests.
  • Testing translation lengths match English equivalent to ensure no lines are missed.

I think we’ve great automated testing coverage but always keen to hear more suggestions here!

Releasing

The site is released periodically by the core maintainers (currently Rick and Barry) by merging main to production, carrying out some extra checks, and then uploading to Google Cloud Platform. We won't release on every merge (especially if there are a few PRs in the pipeline that look nearly done), but at the same time not afraid to release if a good bit of functionality, or an important bug fix is merged.

Goals for 2021 edition

Some of the things I'd love to tackle in this year as well as launching the new edition of course, includes:

Would love to hear your thoughts on these, or any other goals you think we should have!

Let me know if you have any questions, Barry