Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trends 2024 #3618

Open
10 tasks
nrllh opened this issue Mar 2, 2024 · 3 comments
Open
10 tasks

Trends 2024 #3618

nrllh opened this issue Mar 2, 2024 · 3 comments
Labels
2024 chapter Tracking issue for a 2024 chapter help wanted: analysts This chapter is looking for data analysts help wanted: coauthors This chapter is looking for coauthors help wanted: reviewers This chapter is looking for reviewers

Comments

@nrllh
Copy link
Collaborator

nrllh commented Mar 2, 2024

Trends 2024

If you're interested in contributing to the Trends chapter of the 2024 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor. You might be interested in exploring the changes to this year's version here.

This is our first chapter on understanding non-technical trends on the HTTPArchive data and we believe that there is huge potential for content analysis using HTTPArchive data. To delve into this domain, we are introducing a non-technical section. Our goal is to uncover insights into various trends — social, economic, political, and technical — reflected in the webpage content we crawl and evaluate. Methodologies like language models (e.g., BERT) offer promising opportunities for such analysis.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
- - - - - -
Expand for more information about each role 👀
  • The content team lead is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress.
  • Authors are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report.
  • Reviewers are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases.
  • Analysts are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly.
  • Editors are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit.
  • The section coordinator is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule.

Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors.

For an overview of how the roles work together at each phase of the project, see the Chapter Lifecycle doc.

Milestone checklist

0. Form the content team

  • 📆 April 15 Complete program and content committee - 🔑 Organizing committee
    • The content team has at least one author, reviewer, and analyst.

1. Plan content

  • 📆 May 1 First meeting to outline the chapter contents - 🔑 Content team
    • The content team has completed the chapter outline.

2. Gather data

  • 📆 June 1 Custom metrics completed - 🔑 Analysts
  • 📆 June 1 HTTP Archive Crawl - 🔑 HA Team
    • HTTP Archive runs the June crawl.

3. Validate results

  • 📆 August 15 Query Metrics & Save Results - 🔑 Analysts
    • Analysts have queried all metrics and saved the output.

4. Draft content

  • 📆 September 15 First Draft of Chapter - 🔑 Authors
    • Authors has written the chapter.
  • 📆 October 10 Review & Edit Chapter - 🔑 Reviewers & Editors
    • Reviewers and Editors has processed the the chapter.

5. Publication

  • 📆 October 15 Chapter Publication (Markdown & PR) - 🔑 Authors
    • Authors has converted the chapter to markdown and drafted a PR.
  • 📆 November 1 Launch of 2024 Web Almanac 🚀 - 🔑 Organizing committee

6. Virtual conference

  • 📆 November 20 Virtual Conference - 🔑 Content Team

Chapter resources

Refer to these 2024 Privacy resources throughout the content creation process:
📄 Google Docs for outlining and drafting content
🔍 SQL files for committing the queries used during analysis
📊 Google Sheets for saving the results of queries
📝 Markdown file for publishing content and managing public metadata
💻 Collab notebook for collaborative coding in Python - if needed
💬 #web-almanac-trends on Slack for team coordination

@nrllh nrllh added help wanted: reviewers This chapter is looking for reviewers help wanted: analysts This chapter is looking for data analysts help wanted: coauthors This chapter is looking for coauthors 2024 chapter Tracking issue for a 2024 chapter labels Mar 2, 2024
@neriiavr
Copy link

neriiavr commented Mar 3, 2024

I'm interested being an editor

@thibaudcolas
Copy link
Member

@nrllh this feels like an interesting but tough chapter, could you provide more details? ("further updates will follow soon!"). It’d help as I consider which chapter(s) to get involved with this year.

One thing I’d be interested in personally is proportion of AI-generated content on the web over time. In a general sense, or for something like alternative text for images, or e.g. video captions. No idea how to go about this in terms of data analysis though.

@ianand
Copy link

ianand commented Apr 30, 2024

One thing I’d be interested in personally is proportion of AI-generated content on the web over time. In a general sense, or for something like alternative text for images, or e.g. video captions. No idea how to go about this in terms of data analysis though.

This is hard to detect. One possibility though is to analyze the Terms of Use / Privacy Policy pages to figure out the percentage of them that disclose use of AI by processors or subprocessors, which may be required to be disclosed on certain jurisdictions. That won't address the generation of content though and likely only covers how submitted information is used. (As an aside, I expect in the future AI will be used in rendering content as well during the generation of the semantic content so the definition of "AI generated" could get murky.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 chapter Tracking issue for a 2024 chapter help wanted: analysts This chapter is looking for data analysts help wanted: coauthors This chapter is looking for coauthors help wanted: reviewers This chapter is looking for reviewers
Projects
None yet
Development

No branches or pull requests

4 participants