Skip to content

Chapter Lifecycle

Barry Pollard edited this page Oct 10, 2021 · 12 revisions

This doc describes the end-to-end process of a Web Almanac chapter, from planning to publishing. Contributors should refer to this doc to understand how the process works and what the deliverables are for each step.

  1. Create content team (project owners)
  2. Plan content (content team)
    • decide on the scope of the chapter
    • decide on what metrics would be needed/feasible
    • brainstorm a chapter outline
    • divide author responsibilities
  3. Gather data (analysts)
    • prepare for testing
      • instrument custom metrics to assist analysis
      • write queries
    • run tests on HTTP Archive
  4. Validate results (content team)
    • analyze HTTP Archive dataset
      • run queries
      • save data to a spreadsheet
    • review data for comprehension
    • resolve bad data
  5. Draft content (content team)
    • write first draft
    • embed data visualization prototypes
    • iterate with technical review feedback
    • submit for publication
  6. Publication (content team, editors, developers)
    • enforce writing style guide
    • enforce data visualization style guide
    • cross-link chapters
    • generate markup
    • update project config
  7. Translation (translators)
    • draft, review, and publish translation

0. Create content team

Project owners start by creating tracking issues for every potential chapter. The content team for each chapter may not necessarily be ready yet, but this enables contributors to see the chapters that are available to work on.

Contributors may either self-nominate or be nominated to author a chapter in its corresponding tracking issue or in the "call for nominations" issue, if there is one.

The project owners will vet author nominations, gauge nominee interest, and select the authors. The tracking issue reflects the selected author.

At any point, contributors may join the chapter's content team as peer reviewers or data analysts by announcing their interest on the tracking issue. Authors or project owners will update the tracking issue to reflect the status of the team, including: who the members are, their roles, and what roles are still in need of volunteers.

Authors can use their own discretion to add new authors to the chapter. Extra large chapters may necessitate 3 or 4 authors while small chapters may only need 1. Having more than 5 authors will likely lead to an imbalance of responsibilities, "too many cooks in the kitchen", or both.

Authors are only encouraged to author a single chapter to make room for other voices, but exceptions can be made for unstaffed chapters. Reviewers and analysts can contribute to as many chapters as they can afford.

The minimally viable content team contains at least one author, reviewer, and analyst.

Note: The content team may grow or shrink naturally throughout the course of the project. We want to credit everyone who has meaningfully contributed to the project on the Contributors page, so please make sure to keep the team metadata up to date. For example, if someone new gives a lot of helpful feedback on the final draft, they should be credited as a reviewer. If someone volunteers but never actually contributes meaningfully, they should not be credited. Authors, reviewers, and translators are also named in the chapter byline of the final publication.

1. Plan content

Once the minimally viable content team exists, they can start to plan the chapter's content.

The content team needs to define the scope of the chapter and what subtopics will or will not be discussed. This should be done early to help guide the rest of the planning process. It may be helpful to sketch the outline of the chapter (like its table of contents) to imagine how the content would be organized and what areas will be discussed.

After the chapter has been scoped and outlined, the team's analysts can help brainstorm the relevant metrics in the HTTP Archive. It's important to verify the feasibility of metrics now rather than planning the chapter around data that can't be produced.

It's ok for some chapters to be more data-heavy than others. For chapters that don't have many quantitative metrics, the content team should explore ways to supplement the chapter with experiential (rather than empirical) data.

2. Gather data

Data analysts should come out of the content planning process with a list of metrics needed for the chapter. Some metrics are readily available in the HTTP Archive dataset on BigQuery while, others may need a bit of instrumentation, and others may be entirely infeasible. Before the monthly HTTP Archive test begins, analysts should triage all metrics needed by the chapter.

If a metric needs to be instrumented as a custom metric, that must be done prior to the monthly test. Custom metrics are snippets of JavaScript that execute in the global context of the millions of test pages. These snippets of code can collect data from the page at runtime using web APIs to facilitate metrics that analyze page content. For example, it's much easier and more accurate to instrument a metric that detects whether the page uses native image lazy loading using a custom metric that runs document.querySelectorAll('img[loading="lazy"i]') than it is to parse the HTML in BigQuery using regular expressions.

For keeping track of metrics, in previous years we gave each an ID of C.M where C is the chapter number and M is the sequential metric number. For example, 01.12 is the 12th metric of the 1st chapter. This ID doesn't tell us anything about the kind of data it collects, which isn't very helpful without a lookup table. Going forward, metrics should be given descriptive names to make analysis and maintainability easier. For example javascript-bytes-distribution. This can be used as the name of the .sql file, the name of a custom metric, etc.

Before or during the month-long HTTP Archive test, it's a good idea for analysts to start writing the queries for each metric. There are sample tables available in BigQuery that contain a subset of data so you can more quickly and cheaply test the queries. Waiting until the test is done to start writing the queries blocks the validation and writing phases and compresses the amount of time available to the content team.

3. Validate results

After the HTTP Archive test is complete, which is usually the last week of the month, analysts can start running their queries. The CSV results for each query should be exported and collected as tabs in a single Google Sheet for that chapter. If a metric's query is modified at any time, it should be rerun and its corresponding results sheet updated.

After the analysts have gathered the data needed for the chapter, the rest of the content team can review the results. The results are not always self-explanatory, so analysts should do what they can to clarify how to interpret the data. For example, leave a comment on the first row of results using the data in a sentence, like "The median desktop web page includes 456 kilobytes of JavaScript". This can be a lot easier to understand than 456 in a p50 column.

Authors and reviewers should review the results for comprehension and correctness. If the data for a metric is not clear, they should ask for clarification. If the data looks wrong or unintuitive, they should point that out so the analyst could modify/rerun the query if necessary or explain why the results are correct.

4. Draft content

Once authors have a clear picture of the data for their chapter, they can start writing the content. Authors, reviewers, and editors may find it convenient to collaborate on the draft in a Google Doc.

Chapters may have figures to help visualize the results. Analysts can create charts in Google Sheets, which they or the authors can paste into the Google Doc. All visualizations must have accessible descriptions for the visually impaired. There is a style guide for data visualizations (color, font, size, etc) but it's ok to prototype this for drafting purposes. See the Figures Guide for detailed documentation on adding figures to a chapter.

Peer reviewers should be iteratively giving technical feedback on the draft to ensure that the facts are correct and clearly communicated.

When the draft is ready to be published, it will need to be converted to markdown. Editors and authors can work together on the markdown formatting. Each markdown file will also have a yaml section at the top for metadata like chapter number, title, description, authors, reviewers, and translators. The content team lead (or their delegate) will create a pull request including the markdown file and any assets.

5. Publication

Editors will review the chapter carefully for things like minor typos and wording of entire paragraphs. The editors' goal is to ensure that the writing quality is high and consistent across chapters, so that it reads as one cohesive unit despite having dozens of authors.

In parts where a chapter references a topic covered in another chapter, editors will also add an internal link so readers can go between chapters as needed. The URLs for these chapters may not be clear during drafting, so authors may omit links or leave them as placeholders.

The best time for editors to make their changes/suggestions is when authors have a rough draft in the draft Google Doc. At this early stage editors don't necessarily need to focus on fixing every typo or sentence structure because things may still change between then and the final draft, but it's a good time to set expectations for writing conventions.

6. Translation

Asynchronously after chapters have been published, contributors may volunteer to translate content into any of our supported languages. Translators are fluent speakers who are familiar with the technical scope of the chapter.

When a chapter's translation is ready, the translator creates a pull request and assigns it to another fluent speaker for review, if available. Translators should also add their names to the chapter metadata for credit on the published page.

Appendix

Notifications

Because so much of this collaboration happens on GitHub, all contributors must be sure to set their notifications to "Not watching", "Releases only", or "Watching". Those with notifications set to "Ignoring" will not be notified when @mentioned in issues.

image