-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add useful ASG group metrics (TOTAL_INSTANCES
, etc) by default
#2255
Conversation
🦋 Changeset detectedLatest commit: c8400c9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
IN_SERVICE_INSTANCES
group metric by default
db89fb4
to
726a0f2
Compare
IN_SERVICE_INSTANCES
group metric by defaultTOTAL_INSTANCES
, etc) by default
TOTAL_INSTANCES
, etc) by defaultTOTAL_INSTANCES
, etc) by default
644b710
to
8f3006a
Compare
Darn - I came across another case where this setting would be useful today! Looking at https://github.com/guardian/ophan/issues/5970, I also want to know how big the Ophan Dashboard ASG was on 21st March - about 12 days ago. Without this PR, I'm struggling...! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There's a cost associated with every CloudWatch metric recorded,
Worth adding some numbers for this to the release notes, and/or PR description, to help teams understand the impact?
cc @guardian/devx-reliability as this change also fits in their realm
8f3006a
to
d758024
Compare
Thanks @akash1810 - I've updated the PR description with the calculations of the cost - do those numbers look right to you?! If they're correct, I think a monthly cost of $1.48 per ASG is probably reasonable? |
This PR is stale because it has been open 30 days with no activity. Unless a comment is added or the “stale” label removed, this will be closed in 3 days |
d758024
to
2871876
Compare
When diagnosing performance issues with EC2 apps, it's really useful to know the historic context of how many EC2 instances have been present in the Auto-Scaling Group - like, [_"how long have we been running MAPI mobile-fronts with 29 instances?"_](guardian/mobile-apps-api#2865 (comment)) - unfortunately, by default, ASGs don't record these metrics and they need to be explicitly enabled. There's a cost associated with every CloudWatch metric recorded, and there's always a pressure for costs to be kept down. But it's not really possible to know in advance when we're going to need these metrics, and when we need them, we really do need them - otherwise performance diagnosis is missing crucial parts of the puzzle. To minimise the cost here, we're only adding `TOTAL_INSTANCES` & `IN_SERVICE_INSTANCES` here, rather than [all 8 ASG group metrics](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_autoscaling.GroupMetric.html#properties). ## Cost per ASG with 2 metrics enabled https://calculator.aws/#/estimate?id=c593ec622774e2167c447f8d797ab4f04f9226be * per month: $1.48 * per year: $18 There are two costs associated with any ASG metric being stored: * The cost of the metric itself: $0.30 per month * The monthly cost of the `PutMetricData` call made 43800 times ([once a minute](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_autoscaling.CfnAutoScalingGroup.MetricsCollectionProperty.html#granularity)): $0.44 ## Cost on example AWS account The `media-service` account is probably above-average in terms of the number of ASGs it has: **40**. If all ASGs in the account had the 2 metrics enabled, this would have a total cost for the account of: * per month: $59.20 ($1.48 * 40) The current monthly spend in the `media-service` account is about $16000, with $360 being Cloudwatch spend. From this, my belief is that an additional $59.20 in Cloudwatch spend for these metrics is not excessive.
2871876
to
c8400c9
Compare
When diagnosing performance issues with EC2 apps, it's really useful to know the historic context of how many EC2 instances have been present in the Auto-Scaling Group - like, "how long have we been running MAPI mobile-fronts with 29 instances?" - unfortunately, by default, ASGs don't record these metrics and they need to be explicitly enabled.
There's a cost associated with every CloudWatch metric recorded, and I understand there's always a pressure for costs to be kept down. But it's not really possible to know in advance when we're going to need these metrics, and when we need them, we really do need them - otherwise performance diagnosis is missing crucial parts of the puzzle.
To minimise the cost here, I'm only proposing that we add
TOTAL_INSTANCES
&IN_SERVICE_INSTANCES
here, rather than all 8 ASG group metrics.Cost per ASG with 2 metrics enabled
https://calculator.aws/#/estimate?id=c593ec622774e2167c447f8d797ab4f04f9226be
There are two costs associated with any ASG metric being stored:
PutMetricData
call made 43800 times (once a minute): $0.44Cost on example AWS account
The
media-service
account is probably above-average in terms of the number of ASGs it has: 40. If all ASGs in the account had the 2 metrics enabled, this would have a total cost for the account of:The current monthly spend in the
media-service
account is about $16000, with $360 being Cloudwatch spend. From this, my belief is that an additional $59.20 in Cloudwatch spend for these metrics is not excessive.