Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture (and possibly automate) traffic CSVs from readthedocs #14

Open
ericsnekbytes opened this issue Oct 18, 2023 · 17 comments
Open

Capture (and possibly automate) traffic CSVs from readthedocs #14

ericsnekbytes opened this issue Oct 18, 2023 · 17 comments
Assignees

Comments

@ericsnekbytes
Copy link
Collaborator

ericsnekbytes commented Oct 18, 2023

ReadTheDocs offers traffic and search stats that Jupyter subprojects can use to direct their docs improvement efforts. Right now, these metrics are not widely used (as indicated by discussions in group meetings) and are not easily accessible (they're locked behind an admin panel). They can be made easily available and usable from a central location so that subprojects can better benefit from the insights they contain.

  • Proposal
    • Store CSVs for multiple subprojects in one place
    • Automate some basics
      • Display basic stats summary info (including some simple plots/queries)
      • Merge multiple CSVs covering different time spans
      • Make the data easily accessible for download and use by others
@ericsnekbytes
Copy link
Collaborator Author

@jtpio mentioned Chris Holdgraf's repo metrics notebooks, we can look at those for inspiration.

@krassowski
Copy link
Member

This is the kind of data that is available from read the docs, on example of JupyterLab:

Summary The CSV head
image image

@krassowski
Copy link
Member

Previously I brought up adding a footer like on GitHub:

image

@ericsnekbytes
Copy link
Collaborator Author

This is in progress here:

image

@blink1073
Copy link
Member

readthedocs_traffic_analytics_jupyter-server_2023-12-27_2024-03-26.csv

@blink1073
Copy link
Member

readthedocs_traffic_analytics_jupyter-enterprise-gateway_2023-12-27_2024-03-26.csv

@ericsnekbytes
Copy link
Collaborator Author

@blink1073 Thank you, it's much appreciated!

@choldgraf
Copy link

choldgraf commented Mar 26, 2024

Two quick thoughts:

Inspiration via jupyter book

If you want some inspiration, I often use Jupyter Book for this kind of thing. For example, here's a dashboard I've used in the past for tracking activity within the Jupyter ecosystem (it's now out of date so there's an error message but you get the idea):

https://chrisholdgraf.com/jupyter-activity-snapshot/jupyter.html#merged-pull-requests
source: https://github.com/choldgraf/jupyter-activity-snapshot

That uses papermill to use a github organization stats template that creates the pages for each organization. It uses that to generate the source files of pages that then go into a jupyter-book build process.

Plausible?

Historically, we've used Google Analytics to track user behavior across our websites, including docs. This was very useful for things like generating impact reports for grants. We moved away from Google Analytics for privacy reasons, but some folks mentioned that https://plausible.io/ was an attractive alternative that wouldn't have the same concerns.1

Would it be less work if Jupyter self-hosted a plausible instance that generated dashboards for all of the sub-project docs sites? Apologies if this has already been discussed and decided on, just wanted to throw it out there in case it creates an "ah-ha that would be way easier" response.

Footnotes

  1. Another option is Matomo, no strong opinions from me.

@ericsnekbytes
Copy link
Collaborator Author

@blink1073 AWESOME, thank you!

@ericsnekbytes
Copy link
Collaborator Author

@choldgraf These look like just the thing (I was pondering something similar here), thanks for linking. I may contact you further down the line.

@blink1073 Also I just noticed and hate to bother you further but there should be a second SEARCH csv for all those sites also if you are able to provide those 😅

@ericsnekbytes
Copy link
Collaborator Author

@blink1073 Geeze this is fantastic, thanks for single handedly knocking this problem out of the park :D

@choldgraf
Copy link

choldgraf commented Mar 26, 2024

In case it's helpful @ericsnekbytes:

Here are the templates that are used to generate org-specific pages: https://github.com/choldgraf/jupyter-activity-snapshot/tree/main/monthly_update/templates

Specifically here's the one that generates the org reports I mentioned before: https://github.com/choldgraf/jupyter-activity-snapshot/blob/main/monthly_update/templates/org_report.ipynb

You can see where the templates have variables to be inserted later within {{ }}, for example:

CleanShot 2024-03-26 at 13 28 11@2x

You can then generate pages using that template with code like this:
for org in github_orgs:

path_book = Path("generated/book")
for org in github_orgs:
    parameters = dict(github_org=org, n_days=n_days)
    path_out = path_book.joinpath(f"{org}.ipynb")
    ntbk = pm.execute_notebook(
        "./templates/org_report.ipynb",
        str(path_out),
        parameters=parameters,
        nest_asyncio=True,
        cwd="./templates/",
    )

    # Remove the param cell so it doesn't show up
    (param_cell,) = [
        cell for cell in ntbk.cells if "injected-parameters" in cell.metadata.tags
    ]
    param_cell.metadata.tags.append("remove-cell")
    nbs = nbf.writes(ntbk)
    nbs = nbs.replace("{{ github_org }}", org)
    path_out.write_text(nbs)

And then these two github actions are used in the CI/CD to build the pages from a template, and then build the book:

    - name: Generate book pages with latest data
      run: |
        papermill --cwd monthly_update monthly_update/run_template.ipynb -
      env:
        GITHUB_ACCESS_TOKEN: ${{ secrets.ACCESS_TOKEN }}

    # Build the book
    - name: Build the book
      run: |
        jb toc from-project monthly_update/generated/book -e .ipynb -e .md -e .rst --guess-titles > monthly_update/generated/book/_toc.yml
        jb build monthly_update/generated/book

I think that's the core of the logic there. A lot of the code there is very stale which is why I'm trying to point out the details here. If you really wanna get fancy you could also try the new MyST build engine at https://mystmd.org :-)

@minrk
Copy link
Member

minrk commented Mar 27, 2024

I grabbed the stats for the docs I have access to here: https://gist.github.com/minrk/c1df933c520f9a51ee2bf474817a20bb

including the notebook I used to get them. It seems the traffic data isn't in the API, so I needed to script it with playwright.

@ericsnekbytes
Copy link
Collaborator Author

@choldgraf Thanks for the additional details.

@minrk I'll be digging through these and may ping you again for some additional info, thanks for providing these 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants