Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add useful ASG group metrics (TOTAL_INSTANCES, etc) by default #2255

Merged
merged 1 commit into from
May 14, 2024

Conversation

rtyley
Copy link
Member

@rtyley rtyley commented Mar 15, 2024

When diagnosing performance issues with EC2 apps, it's really useful to know the historic context of how many EC2 instances have been present in the Auto-Scaling Group - like, "how long have we been running MAPI mobile-fronts with 29 instances?" - unfortunately, by default, ASGs don't record these metrics and they need to be explicitly enabled.

There's a cost associated with every CloudWatch metric recorded, and I understand there's always a pressure for costs to be kept down. But it's not really possible to know in advance when we're going to need these metrics, and when we need them, we really do need them - otherwise performance diagnosis is missing crucial parts of the puzzle.

To minimise the cost here, I'm only proposing that we add TOTAL_INSTANCES & IN_SERVICE_INSTANCES here, rather than all 8 ASG group metrics.

Cost per ASG with 2 metrics enabled

https://calculator.aws/#/estimate?id=c593ec622774e2167c447f8d797ab4f04f9226be

  • per month: $1.48
  • per year: $18

There are two costs associated with any ASG metric being stored:

  • The cost of the metric itself: $0.30 per month
  • The monthly cost of the PutMetricData call made 43800 times (once a minute): $0.44

Cost on example AWS account

The media-service account is probably above-average in terms of the number of ASGs it has: 40. If all ASGs in the account had the 2 metrics enabled, this would have a total cost for the account of:

  • per month: $59.20 ($1.48 * 40)

The current monthly spend in the media-service account is about $16000, with $360 being Cloudwatch spend. From this, my belief is that an additional $59.20 in Cloudwatch spend for these metrics is not excessive.

Copy link

changeset-bot bot commented Mar 15, 2024

🦋 Changeset detected

Latest commit: c8400c9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@guardian/cdk Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rtyley rtyley changed the title Add important group metric for ASGs Add the useful ASG IN_SERVICE_INSTANCES group metric by default Mar 15, 2024
@rtyley rtyley force-pushed the add-asg-group-metrics-by-default branch from db89fb4 to 726a0f2 Compare March 15, 2024 17:09
@rtyley rtyley changed the title Add the useful ASG IN_SERVICE_INSTANCES group metric by default Add useful ASG group metrics (TOTAL_INSTANCES, etc) by default Mar 15, 2024
@rtyley rtyley changed the title Add useful ASG group metrics (TOTAL_INSTANCES, etc) by default feat: Add useful ASG group metrics (TOTAL_INSTANCES, etc) by default Mar 15, 2024
@rtyley rtyley force-pushed the add-asg-group-metrics-by-default branch 4 times, most recently from 644b710 to 8f3006a Compare March 15, 2024 17:32
@rtyley rtyley marked this pull request as ready for review March 15, 2024 17:36
@rtyley rtyley requested a review from akash1810 March 15, 2024 17:37
@rtyley
Copy link
Member Author

rtyley commented Apr 2, 2024

Darn - I came across another case where this setting would be useful today! Looking at https://github.com/guardian/ophan/issues/5970, I also want to know how big the Ophan Dashboard ASG was on 21st March - about 12 days ago. Without this PR, I'm struggling...!

Copy link
Member

@akash1810 akash1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

There's a cost associated with every CloudWatch metric recorded,

Worth adding some numbers for this to the release notes, and/or PR description, to help teams understand the impact?

cc @guardian/devx-reliability as this change also fits in their realm

@rtyley rtyley force-pushed the add-asg-group-metrics-by-default branch from 8f3006a to d758024 Compare April 12, 2024 15:58
@rtyley
Copy link
Member Author

rtyley commented Apr 12, 2024

Worth adding some numbers for this to the release notes, and/or PR description, to help teams understand the impact?

Thanks @akash1810 - I've updated the PR description with the calculations of the cost - do those numbers look right to you?!

If they're correct, I think a monthly cost of $1.48 per ASG is probably reasonable?

Copy link
Contributor

This PR is stale because it has been open 30 days with no activity. Unless a comment is added or the “stale” label removed, this will be closed in 3 days

@github-actions github-actions bot added the Stale label May 13, 2024
@rtyley rtyley force-pushed the add-asg-group-metrics-by-default branch from d758024 to 2871876 Compare May 14, 2024 09:15
@rtyley rtyley requested a review from a team as a code owner May 14, 2024 09:15
When diagnosing performance issues with EC2 apps, it's really useful to know the historic context of how many EC2 instances have been present in the Auto-Scaling Group - like, [_"how long have we been running MAPI mobile-fronts with 29 instances?"_](guardian/mobile-apps-api#2865 (comment)) - unfortunately, by default, ASGs don't record these metrics and they need to be explicitly enabled.

There's a cost associated with every CloudWatch metric recorded, and there's always a pressure for costs to be kept down. But it's not really possible to know in advance when we're going to need these metrics, and when we need them, we really do need them - otherwise performance diagnosis is missing crucial parts of the puzzle.

To minimise the cost here, we're only adding `TOTAL_INSTANCES` & `IN_SERVICE_INSTANCES` here, rather than [all 8 ASG group metrics](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_autoscaling.GroupMetric.html#properties).

## Cost per ASG with 2 metrics enabled

https://calculator.aws/#/estimate?id=c593ec622774e2167c447f8d797ab4f04f9226be

* per month: $1.48
* per year: $18

There are two costs associated with any ASG metric being stored:

* The cost of the metric itself: $0.30 per month
* The monthly cost of the `PutMetricData` call made 43800 times ([once a minute](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_autoscaling.CfnAutoScalingGroup.MetricsCollectionProperty.html#granularity)): $0.44

## Cost on example AWS account

The `media-service` account is probably above-average in terms of the number of ASGs it has: **40**. If all ASGs in the account had the 2 metrics enabled, this would have a total cost for the account of:

* per month: $59.20 ($1.48 * 40)

The current monthly spend in the `media-service` account is about $16000, with $360 being Cloudwatch spend. From this, my belief is that an additional $59.20 in Cloudwatch spend for these metrics is not excessive.
@rtyley rtyley force-pushed the add-asg-group-metrics-by-default branch from 2871876 to c8400c9 Compare May 14, 2024 09:31
@rtyley rtyley merged commit 176c326 into main May 14, 2024
2 checks passed
@rtyley rtyley deleted the add-asg-group-metrics-by-default branch May 14, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants