Capture (and possibly automate) traffic CSVs from readthedocs #14

ericsnekbytes · 2023-10-18T15:41:35Z

ReadTheDocs offers traffic and search stats that Jupyter subprojects can use to direct their docs improvement efforts. Right now, these metrics are not widely used (as indicated by discussions in group meetings) and are not easily accessible (they're locked behind an admin panel). They can be made easily available and usable from a central location so that subprojects can better benefit from the insights they contain.

Proposal
- Store CSVs for multiple subprojects in one place
- Automate some basics
  - Display basic stats summary info (including some simple plots/queries)
  - Merge multiple CSVs covering different time spans
  - Make the data easily accessible for download and use by others

ericsnekbytes · 2023-10-18T15:44:52Z

@jtpio mentioned Chris Holdgraf's repo metrics notebooks, we can look at those for inspiration.

krassowski · 2023-10-18T15:51:59Z

This is the kind of data that is available from read the docs, on example of JupyterLab:

Summary	The CSV head

krassowski · 2023-10-18T15:52:56Z

Previously I brought up adding a footer like on GitHub:

ericsnekbytes · 2024-03-26T19:27:42Z

This is in progress here:

blink1073 · 2024-03-26T20:06:39Z

readthedocs_traffic_analytics_jupyter-server_2023-12-27_2024-03-26.csv

blink1073 · 2024-03-26T20:07:22Z

readthedocs_traffic_analytics_jupyter-enterprise-gateway_2023-12-27_2024-03-26.csv

ericsnekbytes · 2024-03-26T20:07:55Z

@blink1073 Thank you, it's much appreciated!

blink1073 · 2024-03-26T20:11:29Z

readthedocs_traffic_analytics_jupyterlab-server_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_jupyterlab_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_jupyter-client_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_jupyter_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_ipywidgets_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_ipykernel_2023-12-27_2024-03-26.csv

blink1073 · 2024-03-26T20:13:20Z

readthedocs_traffic_analytics_traitlets_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_terminado_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_nbformat_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_nbconvert_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_lumino_2023-12-27_2024-03-26.csv
readthedocs_traffic_analytics_jupyter-notebook_2023-12-27_2024-03-26.csv

Okay that's all I have access to. 😄

choldgraf · 2024-03-26T20:14:08Z

Two quick thoughts:

Inspiration via jupyter book

If you want some inspiration, I often use Jupyter Book for this kind of thing. For example, here's a dashboard I've used in the past for tracking activity within the Jupyter ecosystem (it's now out of date so there's an error message but you get the idea):

https://chrisholdgraf.com/jupyter-activity-snapshot/jupyter.html#merged-pull-requests
source: https://github.com/choldgraf/jupyter-activity-snapshot

That uses papermill to use a github organization stats template that creates the pages for each organization. It uses that to generate the source files of pages that then go into a jupyter-book build process.

Plausible?

Historically, we've used Google Analytics to track user behavior across our websites, including docs. This was very useful for things like generating impact reports for grants. We moved away from Google Analytics for privacy reasons, but some folks mentioned that https://plausible.io/ was an attractive alternative that wouldn't have the same concerns.¹

Would it be less work if Jupyter self-hosted a plausible instance that generated dashboards for all of the sub-project docs sites? Apologies if this has already been discussed and decided on, just wanted to throw it out there in case it creates an "ah-ha that would be way easier" response.

Another option is Matomo, no strong opinions from me. ↩

ericsnekbytes · 2024-03-26T20:14:56Z

@blink1073 AWESOME, thank you!

ericsnekbytes · 2024-03-26T20:20:29Z

@choldgraf These look like just the thing (I was pondering something similar here), thanks for linking. I may contact you further down the line.

@blink1073 Also I just noticed and hate to bother you further but there should be a second SEARCH csv for all those sites also if you are able to provide those 😅

blink1073 · 2024-03-26T20:24:52Z

ericsnekbytes · 2024-03-26T20:26:20Z

@blink1073 Geeze this is fantastic, thanks for single handedly knocking this problem out of the park :D

choldgraf · 2024-03-26T20:32:31Z

In case it's helpful @ericsnekbytes:

Here are the templates that are used to generate org-specific pages: https://github.com/choldgraf/jupyter-activity-snapshot/tree/main/monthly_update/templates

Specifically here's the one that generates the org reports I mentioned before: https://github.com/choldgraf/jupyter-activity-snapshot/blob/main/monthly_update/templates/org_report.ipynb

You can see where the templates have variables to be inserted later within {{ }}, for example:

You can then generate pages using that template with code like this:
for org in github_orgs:

path_book = Path("generated/book")
for org in github_orgs:
    parameters = dict(github_org=org, n_days=n_days)
    path_out = path_book.joinpath(f"{org}.ipynb")
    ntbk = pm.execute_notebook(
        "./templates/org_report.ipynb",
        str(path_out),
        parameters=parameters,
        nest_asyncio=True,
        cwd="./templates/",
    )

    # Remove the param cell so it doesn't show up
    (param_cell,) = [
        cell for cell in ntbk.cells if "injected-parameters" in cell.metadata.tags
    ]
    param_cell.metadata.tags.append("remove-cell")
    nbs = nbf.writes(ntbk)
    nbs = nbs.replace("{{ github_org }}", org)
    path_out.write_text(nbs)

And then these two github actions are used in the CI/CD to build the pages from a template, and then build the book:

    - name: Generate book pages with latest data
      run: |
        papermill --cwd monthly_update monthly_update/run_template.ipynb -
      env:
        GITHUB_ACCESS_TOKEN: ${{ secrets.ACCESS_TOKEN }}

    # Build the book
    - name: Build the book
      run: |
        jb toc from-project monthly_update/generated/book -e .ipynb -e .md -e .rst --guess-titles > monthly_update/generated/book/_toc.yml
        jb build monthly_update/generated/book

I think that's the core of the logic there. A lot of the code there is very stale which is why I'm trying to point out the details here. If you really wanna get fancy you could also try the new MyST build engine at https://mystmd.org :-)

minrk · 2024-03-27T08:54:35Z

I grabbed the stats for the docs I have access to here: https://gist.github.com/minrk/c1df933c520f9a51ee2bf474817a20bb

including the notebook I used to get them. It seems the traffic data isn't in the API, so I needed to script it with playwright.

ericsnekbytes · 2024-03-28T12:52:05Z

@choldgraf Thanks for the additional details.

@minrk I'll be digging through these and may ping you again for some additional info, thanks for providing these 👍

jtpio mentioned this issue Nov 22, 2023

Notebook Weekly Meeting 2023 jupyter/notebook-team-compass#21

Closed

ericsnekbytes self-assigned this Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture (and possibly automate) traffic CSVs from readthedocs #14

Capture (and possibly automate) traffic CSVs from readthedocs #14

ericsnekbytes commented Oct 18, 2023 •

edited

ericsnekbytes commented Oct 18, 2023

krassowski commented Oct 18, 2023

krassowski commented Oct 18, 2023

ericsnekbytes commented Mar 26, 2024

blink1073 commented Mar 26, 2024

blink1073 commented Mar 26, 2024

ericsnekbytes commented Mar 26, 2024

blink1073 commented Mar 26, 2024

blink1073 commented Mar 26, 2024

choldgraf commented Mar 26, 2024 •

edited

ericsnekbytes commented Mar 26, 2024

ericsnekbytes commented Mar 26, 2024

blink1073 commented Mar 26, 2024

ericsnekbytes commented Mar 26, 2024

choldgraf commented Mar 26, 2024 •

edited

minrk commented Mar 27, 2024

ericsnekbytes commented Mar 28, 2024

Capture (and possibly automate) traffic CSVs from readthedocs #14

Capture (and possibly automate) traffic CSVs from readthedocs #14

Comments

ericsnekbytes commented Oct 18, 2023 • edited

ericsnekbytes commented Oct 18, 2023

krassowski commented Oct 18, 2023

krassowski commented Oct 18, 2023

ericsnekbytes commented Mar 26, 2024

blink1073 commented Mar 26, 2024

blink1073 commented Mar 26, 2024

ericsnekbytes commented Mar 26, 2024

blink1073 commented Mar 26, 2024

blink1073 commented Mar 26, 2024

choldgraf commented Mar 26, 2024 • edited

Inspiration via jupyter book

Plausible?

Footnotes

ericsnekbytes commented Mar 26, 2024

ericsnekbytes commented Mar 26, 2024

blink1073 commented Mar 26, 2024

ericsnekbytes commented Mar 26, 2024

choldgraf commented Mar 26, 2024 • edited

minrk commented Mar 27, 2024

ericsnekbytes commented Mar 28, 2024

ericsnekbytes commented Oct 18, 2023 •

edited

choldgraf commented Mar 26, 2024 •

edited

choldgraf commented Mar 26, 2024 •

edited