Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for retrieving metrics on software catalog #174

Open
biancini opened this issue Aug 22, 2019 · 5 comments
Open

API for retrieving metrics on software catalog #174

biancini opened this issue Aug 22, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@biancini
Copy link
Member

To integrate the work ongoing on metric creation for Developers /Italia, it would be great to have an API (talking JSON) that shows the following data:

  • Number of software solutions published in the catalog in the section for PAs
    Numeric indicator with the number of software projects released in the catalog in section A.
  • Number of software solutions published in the catalog in the section for third-parties
    Numeric indicator with the number of software projects released in the catalog in section B.
  • Number of unique administrations
    Numeric indicator of all unique administrations that have at least one software published in the catalog.
  • Mean vitality index
    Numeri indicator of the average of the vitality index for all projects in the catalog (either A and section B).

If the crawler has the data, it would be great to have this JSON API also proposing the evolution of these numbers over time (since the beginning of Developers /Italia).
The output could be of this form:

[
  "2017-07-21T00:00:00Z": {
    "num_software_pa": 30,
    "num_sofware_thirdparty": 4,
    "num_administrations": 5,
    "mean_vitality": 0.67
  },
  ...
  "2018-08-22T00:00:00Z": {
    "num_software_pa": 30,
    "num_sofware_thirdparty": 4,
    "num_administrations": 5,
    "mean_vitality": 0.67
  },
]
@biancini biancini added the enhancement New feature or request label Aug 22, 2019
@sebbalex sebbalex self-assigned this Oct 2, 2019
@libremente
Copy link
Member

Let's have the crawler query the ES and output the results in a JSON file. Such a file will be public in a directory served by nginx. See italia/developers.italia.it#406 as a reference to such a public dir.

@sebbalex
Copy link
Member

@biancini we could close this issue since the point was achieved by italia/developers.italia.it#406
What do you think?

@libremente
Copy link
Member

@sebbalex I believe the solution that is in use right now is not an API so I would leave this open for future improvements. I still believe it could be nice to have an actual and proper API for this.

@bfabio bfabio transferred this issue from italia/publiccode-crawler Oct 19, 2022
@bfabio
Copy link
Member

bfabio commented Oct 19, 2022

Moving the issue to developers-italia-api

@bfabio
Copy link
Member

bfabio commented Mar 24, 2024

This should be doable now with something like:

import json
from collections import defaultdict

import requests
import yaml

API_BASE_URL = "https://api.developers.italia.it/v1"


def get_paginated(resource: str):
    items = []

    page = True
    page_after = ""

    while page:
        res = requests.get(f"{API_BASE_URL}/{resource}?all=true&{page_after}")
        res.raise_for_status()

        body = res.json()
        items += body["data"]

        page_after = body["links"]["next"]
        if page_after:
            # Remove the '?'
            page_after = page_after[1:]

        page = bool(page_after)

    return items


software = get_paginated("software")
publishers = get_paginated("publishers")

by_date = defaultdict(
    lambda: {
        "num_software_pa": 0,
        "num_software_thirdparty": 0,
        "num_administrations": 0,
    }
)

for s in software:
    date = s["createdAt"][:10]
    try:
        publiccode = yaml.safe_load(s["publiccodeYml"])
        if publiccode.get("it", {}).get("riuso", {}).get("codiceIPA"):
            by_date[date]["num_software_pa"] += 1
        else:
            by_date[date]["num_software_thirdparty"] += 1
    except:
        pass

administrations = set()
for publisher in publishers:
    if publisher.get("alternativeId"):
        administrations.add(publisher["id"])
        date = publisher["createdAt"][:10]
        by_date[date]["num_administrations"] = len(administrations)

print(json.dumps([{date: counts} for date, counts in by_date.items()], indent=4))

but I wouldn't turn into an endpoint into the API, as the data is easily available without hardcoding the metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 📋 Backlog
Development

No branches or pull requests

5 participants