Skip to content

Fetching file from GitHub

Anthony Fok edited this page May 14, 2021 · 2 revisions

(TODO)

Related to:

fetch_csv_xz:

fetch_csv_lfs:

Can we use directory listing instead for fetch_csv_lfs also? In case some repos have some *.csv files on Git LFS and some in repo.

References:

{
    "message": "This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size.",
    "errors": [
        {
            "resource": "Blob",
            "field": "data",
            "code": "too_large"
        }
    ],
    "documentation_url": "https://developer.github.com/v3/repos/contents/#get-contents"
}

GitHub Data API, base64-encoded blob

Deprecated. There is a better, more direct way to download the file without dealing with base64 decoding.

With directory listing downloaded as github-api/social-vulnerability.dir.json, e.g. the following excerpt:

[
  {
    "name": "sovi_thresholds_2021.csv.xz",
    "path": "social-vulnerability/sovi_thresholds_2021.csv.xz",
    "sha": "1e57fa65a807041a8fdd81793dc82965c7f873a3",
    "size": 1020,
    "url": "https://api.github.com/repos/OpenDRR/model-inputs-xz/contents/social-vulnerability/sovi_thresholds_2021.csv.xz?ref=develop",
    "html_url": "https://github.com/OpenDRR/model-inputs-xz/blob/develop/social-vulnerability/sovi_thresholds_2021.csv.xz",
    "git_url": "https://api.github.com/repos/OpenDRR/model-inputs-xz/git/blobs/1e57fa65a807041a8fdd81793dc82965c7f873a3",
    "download_url": "https://raw.githubusercontent.com/OpenDRR/model-inputs-xz/develop/social-vulnerability/sovi_thresholds_2021.csv.xz?token=AAJXHDGVU75KUM5OMM6EIVTATF4D6",
    "type": "file",
    "_links": {
      "self": "https://api.github.com/repos/OpenDRR/model-inputs-xz/contents/social-vulnerability/sovi_thresholds_2021.csv.xz?ref=develop",
      "git": "https://api.github.com/repos/OpenDRR/model-inputs-xz/git/blobs/1e57fa65a807041a8fdd81793dc82965c7f873a3",
      "html": "https://github.com/OpenDRR/model-inputs-xz/blob/develop/social-vulnerability/sovi_thresholds_2021.csv.xz"
    }
  }
]
$ jq -r '.[] | select(.name == "sovi_thresholds_2021.csv.xz") | ._links.git' github-api/social-vulnerability.dir.json
https://api.github.com/repos/OpenDRR/model-inputs-xz/git/blobs/1e57fa65a807041a8fdd81793dc82965c7f873a3

or

$ jq -r '.[] | select(.name == "sovi_thresholds_2021.csv.xz") | .git_url' github-api/social-vulnerability.dir.json
https://api.github.com/repos/OpenDRR/model-inputs-xz/git/blobs/1e57fa65a807041a8fdd81793dc82965c7f873a3
  local blob=$(jq -r '.[] | select(.name == "sovi_thresholds_2021.csv.xz") | .git_url' $response)

  curl -H "Authorization: token ${GITHUB_TOKEN}" -L "$blob" | \
    jq -r '.content' | base64 -d > sovi_thresholds_2021.csv.xz

  ls -l sovi_thresholds_2021.csv.xz