Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store authoritative copy of packages on Azure Blob Storage instead of self-hosted web server for faster uploads to CDN #5

Open
4 of 14 tasks
Aldaviva opened this issue May 1, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Aldaviva
Copy link
Owner

Aldaviva commented May 1, 2024

Currently, the network architecture of this repository is Raspberry Pi → Azure CDN → self-hosted origin web server thanks to #2.

  • Pros: Free to operate, simple to program
  • Cons: Initial upload from origin to CDN is sort of fast but could be faster, and traffic is low enough that most requests are cache misses

A potential improvement is to move the authoritative source of the repo files from the self-hosted web server to Azure Blob Storage (like S3), which has a very fast connection to Azure CDN. Estimated additional hosting costs are about $0.16/month USD. Raspberry Pi → Azure CDNAzure Blob Storage ← repo generator.

  • Pros: Faster responses during CDN cache misses, theoretically higher reliability and less storage space used although these haven't been issues yet
  • Cons: Tiny cost, new and more complicated repo generation logic, blob storage does not generate HTML default index pages if you want to manually browse the directories

I've already created a blob storage account and container, and manually uploaded the most recent packages and metadata files for .NET 8.0.4.

Changes required to make repo generator program use blob storage directly

  1. Assume no packages or metadata files are stored locally (easy)
  2. Get rid of the most recently seen JSON file, we don't need that any more (easy)
  3. Download current .NET release index JSON files from Microsoft (existing functionality)
  4. Download and parse a new repo index JSON file from blob storage, handle missing (easy)
  5. If the repo index JSON file was generated against up-to-date .NET and Debian versions, then stop (mostly done)
  6. Generate each package locally that does not exist in blob storage already (mostly done)
  7. Upload each new package to blob storage container, capped at # concurrent uploads using DataFlow, with correct content-type value (easy)
  8. Generate updated package index files, including new and unchanged packages, and excluding outdated packages (mostly done)
  9. Generate updated release index files based on updated package index files, and sign them (mostly done)
  10. Upload updated package and release index files to blob storage, also with correct content-type headers (easy)
  11. Generate and upload new repo index JSON file with all the packages that exist in the repo now, as well as the upstream .NET and Debian versions the repo was generated against (easy)
  12. Delete outdated package files from blob storage (easy)
  13. Purge CDN (existing functionality)
  14. Delete local temporary working files like upstream SDK downloads, packages, and any index files, unless configured otherwise (easy)
@Aldaviva Aldaviva self-assigned this May 1, 2024
@Aldaviva Aldaviva added the enhancement New feature or request label May 1, 2024
@Aldaviva
Copy link
Owner Author

Aldaviva commented May 1, 2024

Regular expression patterns to split and extract key-value pairs from package metadata control or index files:

\n{2,}
/(?<key>[\w-]+): (?<value>.+?)(?:\n(?! )|$)/gs

Although I could also just serialize everything into, for example, one big JSON or XML file, so I don't have to write a control file parser.

@Aldaviva
Copy link
Owner Author

Aldaviva commented May 1, 2024

Azure CDN (Classic) by Microsoft does not have CDN preloading, unlike the EdgeIO CDN that I don't want to use because it's related to Verizon. Manual preloading by requesting each file would increase the billable traffic, take a long time, and would also probably only cache each file on one CDN edge server closest to the preloading client instead of all CDN servers, so any Raspberry Pis in different regions probably wouldn't benefit from the preloading.

@Aldaviva
Copy link
Owner Author

Aldaviva commented May 2, 2024

@Aldaviva Aldaviva added this to the 1.0.0 milestone May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant