CI: 2023 04 07
Zack Galbreath edited this page Apr 7, 2023
·
2 revisions
- Alec Scott
- Dan LaManna
- Jacob Nesbitt
- John Parent
- Luke Peyralans
- Ryan Krattiger
- Scott Wittenburg
- Tammy Grimmett
- Todd Gamblin
- Zack Galbreath
- We've been working on a pie chart to show which packages we spend the most time building.
- We've put together a preliminary chart distinguishing the number of PR vs. develop jobs running at any given time.
- We're considering migrating some of our underlying metrics data from OpenSearch to a cloned & extended copy of GitLab's postgres database. This would be kept in sync using AWS' Database Migration Service.
- OpenSearch exhausted its shard limits for a few days this week. We are working to reingest the data we missed during this time.
- We've developed a preliminary proof-of-concept for getting the EC2 instance type for a running builder pod. This is the first step towards a new "cost per job" metric.
- We've begun updating spackbot to post data to OpenSearch. This will allow us to track how many jobs are due to "@spackbot run pipeline" or "@spackbot rebuild everything" (etc.)
- We verified that our updated job pruning strategy is working as intended.
- Luke demonstrated a new dashboard he developed that allows us to see how much time is spent on retried GitLab CI jobs. We will keep an eye on this to get a sense of how much cost savings we can expect to achieve by eliminating unnecessary retries.
We are looking to update this service to make it more useful & less confusing. Specific improvements should include:
- Usage instructions
- Show file size & hash for each package
- Allow users to browse by stack (ie. browse E4S binaries from the 0.19 release)
- Add links from packages.spack.io to cache.spack.io when applicable
- Rename the "View Packages" link to something like "View Info" or "View Details"
- It looks like we might have unnecessary cross-AZ traffic that is contributing to our EC2-Other and S3 costs. Mike & Zack to investigate further and attempt a fix.
- Our goal is to set up a small pool of runners (and a corresponding stack) using pcluster AMIs
- We are waiting for feedback from AWS on what specific AMIs to use
- This effort will probably also require us to upgrade gitlab.spack.io
- We should also rethink how runner configuration is stored in the spack-infra repo. It currently requires a lot of coping & pasting of YAML to create new types of runners.
- Update the GitLab CI Failures by Error Taxonomy dashboard to be a stacked area chart per ref (develop vs. each PR branch). This change will make it easier for us to triage.
- Continue updating cache.spack.io as described above.
- Keep working on "costs per job" metric.
- Keep working to reduce our AWS bill.