Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[plugin-aws-ec2-ebs] fix misuse of period #915

Merged
merged 2 commits into from Jul 29, 2022
Merged

[plugin-aws-ec2-ebs] fix misuse of period #915

merged 2 commits into from Jul 29, 2022

Conversation

lufia
Copy link
Member

@lufia lufia commented Jul 27, 2022

I fixed period-misuse bugs in mackerel-plugin-aws-ec2-ebs.

The plugin retrieves recent values of the metrics each minutes, and then it elects only a most recent value each metrics from them. Thus these metric values are aggregated by a minute.

In addition, some metric's graph definition such as ec2.ebs.bandwidth.#.read or ec2.ebs.throughput.#.read are designed with per-second units. Thus the plugin should convert these Sumed values to the values avarage per second, like val / 60.

resp, err := p.CloudWatch.GetMetricStatistics(&cloudwatch.GetMetricStatisticsInput{
Dimensions: []*cloudwatch.Dimension{
{
Name: aws.String("VolumeId"),
Value: vol.VolumeId,
},
},
StartTime: &start,
EndTime: &now,
MetricName: &metricName,
Period: aws.Int64(60),
Statistics: []*string{&statType},
Namespace: aws.String("AWS/EBS"),
})

However, in the current implementaton, that formula is wrong.
For instance, getLastPoint method returns a metric value and its period that are aggregated each period, but this period is metricPeriodDefault = 300 (but if not io1).

val, period, err := p.getLastPoint(vol, cloudwatchdef.MetricName, cloudwatchdef.Statistics)
if err != nil {
retErr := errors.New(volumeID + " " + err.Error() + ":" + cloudwatchdef.MetricName)
if err == errNoDataPoint {
// nop
} else {
return nil, retErr
}
} else {
stat[strings.Replace(metricKey, "#", volumeID, -1)] = cloudwatchdef.CalcFunc(val, float64(period))
}

Comment on lines 66 to 73
"ec2.ebs.bandwidth.#.read": cloudWatchSetting{
MetricName: "VolumeReadBytes", Statistics: "Sum",
CalcFunc: func(val float64, period float64) float64 { return val / period },
CalcFunc: valuePerSec,
},
"ec2.ebs.bandwidth.#.write": cloudWatchSetting{
MetricName: "VolumeWriteBytes", Statistics: "Sum",
CalcFunc: func(val float64, period float64) float64 { return val / period },
CalcFunc: valuePerSec,
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ec2.ebs.bandwidth.#": {
Label: "EBS Bandwidth",
Unit: "bytes/sec",
Metrics: []mp.Metrics{
{Name: "read", Label: "Read", Diff: false},
{Name: "write", Label: "Write", Diff: false},
},
},

Comment on lines 74 to 81
"ec2.ebs.throughput.#.read": cloudWatchSetting{
MetricName: "VolumeReadOps", Statistics: "Sum",
CalcFunc: func(val float64, period float64) float64 { return val / period },
CalcFunc: valuePerSec,
},
"ec2.ebs.throughput.#.write": cloudWatchSetting{
MetricName: "VolumeWriteOps", Statistics: "Sum",
CalcFunc: func(val float64, period float64) float64 { return val / period },
CalcFunc: valuePerSec,
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ec2.ebs.throughput.#": {
Label: "EBS Throughput (op/s)",
Unit: "iops",
Metrics: []mp.Metrics{
{Name: "read", Label: "Read", Diff: false},
{Name: "write", Label: "Write", Diff: false},
},
},

Comment on lines 90 to 97
"ec2.ebs.latency.#.read": cloudWatchSetting{
MetricName: "VolumeTotalReadTime", Statistics: "Average",
CalcFunc: func(val float64, period float64) float64 { return val * 1000 },
CalcFunc: sec2msec,
},
"ec2.ebs.latency.#.write": cloudWatchSetting{
MetricName: "VolumeTotalWriteTime", Statistics: "Average",
CalcFunc: func(val float64, period float64) float64 { return val * 1000 },
CalcFunc: sec2msec,
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VolumeTotalReadTime and VolumeTotalWriteTime is in seconds, but this plugin's graph defs is labeled (ms/op).

"ec2.ebs.latency.#": {
Label: "EBS Avg Latency (ms/op)",
Unit: "float",
Metrics: []mp.Metrics{
{Name: "read", Label: "Read", Diff: false},
{Name: "write", Label: "Write", Diff: false},
},
},

Comment on lines 102 to 105
"ec2.ebs.idle_time.#.idle_time": cloudWatchSetting{
MetricName: "VolumeIdleTime", Statistics: "Sum",
CalcFunc: func(val float64, period float64) float64 { return val / period * 100 },
CalcFunc: func(val float64) float64 { return val / aggregationPeriod * 100.0 },
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lufia lufia marked this pull request as ready for review July 27, 2022 05:20
@@ -266,7 +285,7 @@ func (p EBSPlugin) getLastPoint(vol *ec2.Volume, metricName string, statType str
}
}

return latestVal, period, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note] Maybe the simplest fix is returning 60 (or aggregationPeriod) here instead of period?

},
"ec2.ebs.consumed_ops.#.consumed_ops": cloudWatchSetting{
MetricName: "VolumeConsumedReadWriteOps", Statistics: "Sum",
CalcFunc: func(val float64, period float64) float64 { return val },
CalcFunc: value,
Copy link
Contributor

@susisu susisu Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[off-topic] I'm not sure, but because the statistics is Sum, this metric value is per-aggregationPeriod, right?
If so, to output per-minute value consistently, CalcFunc should be something like val / aggregationPeriod * 60?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...

@@ -42,62 +46,74 @@ var io1Graphs = append([]string{
type cloudWatchSetting struct {
MetricName string
Statistics string
CalcFunc func(float64, float64) float64
CalcFunc func(float64) float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second argument (period) was good, because each CalcFunc could be implemented without knowing how getLastPoint works.
Is there any intention you've omitted this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious is there valuable reason to specify other numbers to the period? If nothing, I think it is sufficient to refers aggregationPeriod directly in each CalcFunc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK!
I feel like it's a bit tightly-coupled and hard to understand (you can't tell why CalcFunc divides value by aggregationPeriod unless you read the whole program), but it will be no problem because the program itself is fairy small.

CloudWatch *cloudwatch.CloudWatch
Volumes []*ec2.Volume

// internal states
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lufia
Copy link
Member Author

lufia commented Jul 29, 2022

Thanks for your reviewing!

@lufia lufia merged commit c30977b into master Jul 29, 2022
@lufia lufia deleted the fix-ec2-ebs branch July 29, 2022 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants