Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azmetrics] Inconsistent Metric Values from azmetrics.QueryResources() with Batches of 4+ Resources #22757

Open
zmoog opened this issue Apr 17, 2024 · 12 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor Monitor, Monitor Ingestion, Monitor Query needs-team-attention This issue needs attention from Azure service team or SDK team Service Attention This issue is responsible by Azure service team. Service This issue points to a problem in the service.

Comments

@zmoog
Copy link

zmoog commented Apr 17, 2024

Bug Report

  • when usinggithub.com/Azure/azure-sdk-for-go/sdk/monitor/query/azmetrics v1.0.0
  • with Go v1.21.7 darwin/arm64

Context

I am collecting metrics for 10 Microsoft.KeyVault/vaults.

What happened?

If I call azmetrics.QueryResources() with a batch of 1-3 resources, I get the same data points I see on Azure Portal.

However, I don't get the same values I see in Azure Portal if I try to get metrics values for the same resource in a batch request with 4+ resources.

For example, with a batch of 1-3 resources, I always get two time series values for each resource. Starting with batches of 4+ resources, the number of time series values in the response varies at each request (0-2).

In the following example, I collect the metrics values in two ways:

  • Querying SINGLE resources: I make 10 API calls with one resource each.
  • Querying resources as a GROUP: I make 1 API Call with 10 resources.
$ go run main.go                                                                                                                                                                                                                                         

Ready to go!
----------------------------------------------------
Querying SINGLE resources
----------------------------------------------------
.../providers/Microsoft.KeyVault/vaults/kv1-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv2-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv3-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv4-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv5-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv6-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv7-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv8-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv9-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv10-oeefrt7tlykau
timeseries 2
----------------------------------------------------
Querying resources as a GROUP
----------------------------------------------------
.../providers/Microsoft.KeyVault/vaults/kv1-oeefrt7tlykau
timeseries 1
.../providers/Microsoft.KeyVault/vaults/kv2-oeefrt7tlykau
timeseries 1
.../providers/Microsoft.KeyVault/vaults/kv3-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv4-oeefrt7tlykau
timeseries 2
.../providers/Microsoft.KeyVault/vaults/kv5-oeefrt7tlykau
timeseries 0
.../providers/Microsoft.KeyVault/vaults/kv6-oeefrt7tlykau
timeseries 0
.../providers/Microsoft.KeyVault/vaults/kv7-oeefrt7tlykau
timeseries 0
.../providers/Microsoft.KeyVault/vaults/kv8-oeefrt7tlykau
timeseries 1
.../providers/Microsoft.KeyVault/vaults/kv9-oeefrt7tlykau
timeseries 1
.../providers/Microsoft.KeyVault/vaults/kv10-oeefrt7tlykau
timeseries 2

I get a different number of time series, depending if the same resource is in a batch of 1-3 or 4+ resources.

I repeated this test multiple times. The values for the "SINGLE" case never changed, while the values for the "GROUP" case changed on every call.

What did you expect or want to happen?

For the same resource, I expect azmetrics.QueryResources() to always return the same values, whether it's the only resource ID in the batch or one of the 50 supported resource IDs.

How can we reproduce it?

I created the gist https://gist.github.com/zmoog/fcede6fcbe5ba11f9275c40a58eea38d with:

  • Bicep file I used to create the test key vaults
  • Go with individual and batch calls to azmetrics.QueryResources()

Anything we should know about your environment.

Additional info

I see the same behavior when calling the API endpoint using cURL. See zmoog/public-notes#81 for more details.

Questions:

  • Are my expectations correct?
  • Am I using the API the right way?
  • If the answer to the above questions is affirmative, is this a problem in the API and not the SDK?
@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. KeyVault Mgmt This issue is related to a management-plane library. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. labels Apr 17, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jlichwa @RandalliLama @schaabs.

@jhendrixMSFT jhendrixMSFT added Monitor Monitor, Monitor Ingestion, Monitor Query and removed KeyVault Service Attention This issue is responsible by Azure service team. labels Apr 17, 2024
@github-actions github-actions bot added the needs-team-triage This issue needs the team to triage. label Apr 17, 2024
@jhendrixMSFT jhendrixMSFT removed the needs-team-triage This issue needs the team to triage. label Apr 17, 2024
@gracewilcox gracewilcox added Client This issue points to a problem in the data-plane of the library. and removed Mgmt This issue is related to a management-plane library. labels Apr 18, 2024
@gracewilcox gracewilcox added the Service This issue points to a problem in the service. label Apr 18, 2024
@gracewilcox
Copy link
Member

Hi @zmoog! I think your issue will be fixed by setting the QueryResourcesOptions.Top field to a higher number. If a filter is specified, the service defaults to 10 records to retrieve per resource ID in the request. This is probably the cause of your throttling issues.

I repo'ed your code locally, and when I set Top to a higher number, the TimeSeries was consistent between the individual and group query.

options := azmetrics.QueryResourcesOptions{
		Aggregation: ptr("Count"),
		StartTime:   ptr("2024-04-16T07:18:13.001Z"),
		EndTime:     ptr("2024-04-16T07:19:13.001Z"),
		Filter:   ptr("ActivityType eq '*' AND ActivityName eq '*' AND StatusCode eq '*' AND StatusCodeClass eq '*'"),
		Interval: ptr("PT1M"),
		Top:      to.Ptr(int32(50)),
	}

@gracewilcox gracewilcox added issue-addressed The Azure SDK team member assisting with this issue believes it to be addressed and ready to close. and removed Service This issue points to a problem in the service. needs-team-attention This issue needs attention from Azure service team or SDK team labels Apr 19, 2024
Copy link

Hi @zmoog. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@axw
Copy link

axw commented Apr 21, 2024

@gracewilcox apologies if I'm missing something obvious, but:

  1. the docs state that the default for Top is 10, and that this applies per resource ID
  2. in the example there are fewer than 10 time series being returned per resource ID

So why should we need to increase Top if there are fewer than 10 time series per resource?
And given that Top should apply per resource, why does it matter if they're queried in bulk vs. one at a time?

@zmoog
Copy link
Author

zmoog commented Apr 22, 2024

@gracewilcox, thank you for replying!

The QueryResourcesOptions.Top plays a big role in the number of records in the QueryResources() response.

As you said, the Top option documentation reports:

The maximum number of records to retrieve per resource ID in the request. Valid only if the filter is specified. Defaults to 10.

However, it seems more like an option that applies to the whole batch and not per resource ID.

In my test case, I have:

  • 10 resource IDs
  • 1 metric (ServiceApiResult)
  • 4 dimensions (ActivityType, ActivityName, StatusCode, StatusCodeClass)
  • 1 aggregation type (Count)

All these key vaults are unused (I create them for testing), so I only get two records per resource because we only have two unique combinations of dimension values:

CleanShot 2024-04-22 at 10 11 44

  • vaultget, vault, 200, 2xx
  • vaultput, vault, 200, 2xx

If I set Top = 2 I only get two records for the whole batch:

----------------------------------------------------
Querying resources as a GROUP
----------------------------------------------------
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv1-oeefrt7tlykau
timeseries 0
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv2-oeefrt7tlykau
timeseries 0
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv3-oeefrt7tlykau
timeseries 0
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv4-oeefrt7tlykau
timeseries 0
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv5-oeefrt7tlykau
timeseries 0
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv6-oeefrt7tlykau
timeseries 0
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv7-oeefrt7tlykau
timeseries 1
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv8-oeefrt7tlykau
timeseries 1
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv9-oeefrt7tlykau
timeseries 0
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv10-oeefrt7tlykau
timeseries 0

So I need at least Top = 20 to get all the records.

However, when the resources are used, the number of unique combinations of dimension values varies greatly.

For example, if I start using one one the key vaults, I get 4-5 records instead of 2 for each resource ID:

CleanShot 2024-04-22 at 10 44 52

----------------------------------------------------
Querying resources as a GROUP
----------------------------------------------------
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv1-oeefrt7tlykau
timeseries 5
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv2-oeefrt7tlykau
timeseries 4
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv3-oeefrt7tlykau
timeseries 4
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv4-oeefrt7tlykau
timeseries 4
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv5-oeefrt7tlykau
timeseries 4
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv6-oeefrt7tlykau
timeseries 4
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv7-oeefrt7tlykau
timeseries 4
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv8-oeefrt7tlykau
timeseries 4
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv9-oeefrt7tlykau
timeseries 4
/subscriptions/12cabcb4-86e8-404f-a3d2-1dc9982f45ca/resourceGroups/mbranca-azmetrics-test/providers/Microsoft.KeyVault/vaults/kv10-oeefrt7tlykau
timeseries 4

I guess it's impossible to calculate the exact number of records because it depends on the unique combinations of the dimensions, which vary depending on the Azure service.

@gracewilcox, what strategy do you recommend for setting the Top value, and what's the maximum number allowed?

@gracewilcox
Copy link
Member

Hi @axw and @zmoog! Thank you for the detailed replies. The Top value is supposed to set the maximum records per resource ID, and as you discovered, it's currently not.

The service team is aware of the issue and is currently deploying a fix. Will let you know as soon as the bug is fixed. Thank you for your patience!

Copy link

github-actions bot commented May 1, 2024

Hi @zmoog, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

@github-actions github-actions bot closed this as completed May 1, 2024
@axw
Copy link

axw commented May 1, 2024

/unresolve

Copy link

github-actions bot commented May 1, 2024

Hi $axw, only the original author of the issue can ask that it be unresolved. Please open a new issue with your scenario and details if you would like to discuss this topic with the team.

@gracewilcox gracewilcox reopened this May 1, 2024
@gracewilcox gracewilcox added the Service This issue points to a problem in the service. label May 1, 2024
@gracewilcox gracewilcox added Service Attention This issue is responsible by Azure service team. bug This issue requires a change to an existing behavior in the product in order to be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Client This issue points to a problem in the data-plane of the library. issue-addressed The Azure SDK team member assisting with this issue believes it to be addressed and ready to close. labels May 2, 2024
@github-actions github-actions bot added the needs-team-attention This issue needs attention from Azure service team or SDK team label May 2, 2024
Copy link

github-actions bot commented May 2, 2024

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @AzmonActionG @azmonALERTS @AzMonEssential @AzmonLogA @dadunl @sameergMS.

@zmoog
Copy link
Author

zmoog commented May 3, 2024

If I run the tests at https://gist.github.com/zmoog/fcede6fcbe5ba11f9275c40a58eea38d I still get the same result.

@gracewilcox, was the service updated with the fix? Does the fix require a new api-version?

@gracewilcox
Copy link
Member

@ToddKingMSFT, do you have guidance for this scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor Monitor, Monitor Ingestion, Monitor Query needs-team-attention This issue needs attention from Azure service team or SDK team Service Attention This issue is responsible by Azure service team. Service This issue points to a problem in the service.
Projects
None yet
Development

No branches or pull requests

5 participants