Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Azure Functions Deploy task failing repeatedly #19807

Open
4 of 7 tasks
karun-verghese opened this issue Apr 23, 2024 · 6 comments
Open
4 of 7 tasks

[BUG]: Azure Functions Deploy task failing repeatedly #19807

karun-verghese opened this issue Apr 23, 2024 · 6 comments

Comments

@karun-verghese
Copy link

karun-verghese commented Apr 23, 2024

New issue checklist

Task name

Azure Functions Deploy

Task version

2.238.1

Issue Description

Our CD pipeline that deploys our services and function apps is failing in one environment only. The environment happens to be in Azure CN cloud. However, other deployments to the Azure CN cloud worked fine, only one environment seems to fail repeatedly. The error message reads:
##[error]Error: Failed to get resource ID for resource type 'Microsoft.Web/Sites' and resource name ''. Error: Could not fetch access token for Azure. Status code: endpoints_resolution_error, status message: Error: could not resolve endpoints. Please check network and try again. Detail: ClientAuthError: openid_config_error: Could not retrieve endpoints. Check your authority and verify the .well-known/openid-configuration endpoint returns the required endpoints. Attempted to retrieve endpoints from: https://login.partner.microsoftonline.cn//v2.0/.well-known/openid-configuration

When I call the well known endpoint in my browser it works just fine.

Additionally, all my other deployments work fine, we are not configuring this environment any differently other than choosing the right service principal. The Service Principal itself was checked and is valid.

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
  • Microsoft Hosted
  • VMSS Pool
  • Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Ubuntu latest

Relevant log output

2024-04-23T06:15:22.5394220Z ##[section]Starting: Deploy Catalog Cache Function App
2024-04-23T06:15:22.5402680Z ==============================================================================
2024-04-23T06:15:22.5402913Z Task         : Azure Functions Deploy
2024-04-23T06:15:22.5403083Z Description  : Update a function app with .NET, Python, JavaScript, PowerShell, Java based web applications
2024-04-23T06:15:22.5403334Z Version      : 2.238.1
2024-04-23T06:15:22.5403491Z Author       : Microsoft Corporation
2024-04-23T06:15:22.5403656Z Help         : https://aka.ms/azurefunctiontroubleshooting
2024-04-23T06:15:22.5403827Z ==============================================================================
2024-04-23T06:15:25.0280162Z Got service connection details for Azure App Service:'biz-common-ecn-sand-catalogcache-func'
2024-04-23T06:15:35.2136506Z ##[error]Error: Failed to get resource ID for resource type 'Microsoft.Web/Sites' and resource name '<my function app>'. Error: Could not fetch access token for Azure. Status code: endpoints_resolution_error, status message: Error: could not resolve endpoints. Please check network and try again. Detail: ClientAuthError: openid_config_error: Could not retrieve endpoints. Check your authority and verify the .well-known/openid-configuration endpoint returns the required endpoints. Attempted to retrieve endpoints from: https://login.partner.microsoftonline.cn/<tenantid>/v2.0/.well-known/openid-configuration
2024-04-23T06:15:35.2205561Z ##[section]Finishing: Deploy Catalog Cache Function App

Full task logs with system.debug enabled

 [REPLACE THIS WITH YOUR INFORMATION] 

Repro steps

No response

@karun-verghese
Copy link
Author

karun-verghese commented Apr 23, 2024

Is there any documentation about what permissions are needed by the Service Connection? At the moment my service connection has a Contributor role at the resource group level, the same resource group under which the function app resides. This is the same as the service connections for my other deployments

@abagonhishead
Copy link

abagonhishead commented Apr 23, 2024

Also having this issue, but rather than Azure Functions, we're trying to deploy an app service to Azure China. Specifically it's the 'Azure App Service deploy' task v4 that we're using, although I also tried v3 and had the same problem.
We are getting this on both Azure-hosted agents and self-hosted agents.

I am fairly certain this isn't a permissions issue. I tested this with a manually configured ARM service principal service connection, and also set up a new ARM identity federation service connection according to this guide. Both service connections exhibit exactly the same issue with app service deployments, but work fine with everything else -- we have multiple Azure Powershell release tasks using the same service connections, some of them doing very privileged things like deploying container apps, and they are all working fine.
Our two Azure App Service deploy tasks fail completely, however, with the following:

2024-04-23T13:25:40.3283519Z ##[section]Starting: Deploy: set app service to deployed image
2024-04-23T13:25:40.3293014Z ==============================================================================
2024-04-23T13:25:40.3293173Z Task         : Azure App Service deploy
2024-04-23T13:25:40.3293283Z Description  : Deploy to Azure App Service a web, mobile, or API app using Docker, Java, .NET, .NET Core, Node.js, PHP, Python, or Ruby
2024-04-23T13:25:40.3293471Z Version      : 4.238.1
2024-04-23T13:25:40.3293562Z Author       : Microsoft Corporation
2024-04-23T13:25:40.3293656Z Help         : https://aka.ms/azureappservicetroubleshooting
2024-04-23T13:25:40.3293786Z ==============================================================================
2024-04-23T13:25:44.0584652Z Got service connection details for Azure App Service:'***'
2024-04-23T13:25:53.2883895Z ##[error]Error: Failed to get resource ID for resource type 'Microsoft.Web/Sites' and resource name '***'. Error: Could not fetch access token for Azure. Status code: endpoints_resolution_error, status message: Error: could not resolve endpoints. Please check network and try again. Detail: ClientAuthError: openid_config_error: Could not retrieve endpoints. Check your authority and verify the .well-known/openid-configuration endpoint returns the required endpoints. Attempted to retrieve endpoints from: https://login.partner.microsoftonline.cn/***/v2.0/.well-known/openid-configuration
2024-04-23T13:25:53.2919412Z ##[section]Finishing: Deploy: set app service to deployed image

The important bit is:
Status code: endpoints_resolution_error, status message: Error: could not resolve endpoints. Please check network and try again. Detail: ClientAuthError: openid_config_error: Could not retrieve endpoints. Check your authority and verify the .well-known/openid-configuration endpoint returns the required endpoints. Attempted to retrieve endpoints from: https://login.partner.microsoftonline.cn/***/v2.0/.well-known/openid-configuration

Based on the error message, and the fact that routing the request through a proxy server first resolves the issue (see my workaround below,) I think it might be related to this issue in azure/msal-node. Maybe there's a transparent proxy server somewhere along the route to China that the library doesn't like?

This is really frustrating and has taken me almost all day to work around. Not only do we now have to maintain a self-hosted agent purely for app service deployments into China, we're also unable to make use of our parallel jobs on our China deployment pipelines.

Any ideas on a fix, please?

EDIT: Not sure if this is important, but we're based in the UK, which may affect the location of Azure-hosted agents that are assigned to our organisation (and therefore the routing to Azure China.)

Workaround

If you're able to set up your own self-hosted agent and use that, then there is a workaround. It worked for us, at least!

  • Set up and run a Linux self-hosted agent in the usual way (I haven't tested this with Windows agents)
  • On the same machine, run a Squid forward proxy server in a docker container
    • Make sure you bind the listen port (3128) to 127.0.0.1, e.g. ... -p 127.0.0.1:1234:3128 ...
      • This is important! If you don't bind to 127.0.0.1 then Squid may be open to the Internet
  • Before starting the agent, set the environment variable VSTS_HTTP_PROXY to point at the Squid container, e.g. $ export VSTS_HTTP_PROXY=http://127.0.0.1:1234/
    • You may want to put this in the .env file in the agent root directory, since it looks like it's sourced when the agent starts

@karun-verghese
Copy link
Author

@abagonhishead Thanks for the information on the workaround, I'll take that back to our delivery infrastructure team and see if they can help me with that.

But yes, still hoping for a fix here :(

@karun-verghese
Copy link
Author

It looks like more and more people face this issue as seen at the link below. Still waiting for a response here.
https://developercommunity.visualstudio.com/t/Deploying-to-Azure-China:-Could-not-fetc/10652428?viewtype=all

@karun-verghese
Copy link
Author

karun-verghese commented May 9, 2024

@abagonhishead I think I've found another workaround. I switched the agent from a linux agent to a windows agent. The deployment worked fine on that agent. Did you already try this? It looks like this issue only affects linux agents. Still, only a workaround.

@nakah
Copy link

nakah commented May 14, 2024

I'm also having the same issue recently. I managed to deploy application by switching from AzureRmWebAppDeployment@4 tasks for (Web App & Functions) to AzureFunctionApp@1 & AzureWebApp@1.
However, it's failing now on AzureAppServiceManage@0 when starting the App Service.
It used to work perfectly, I suspect a regression in these tasks with latest releases.

@FinVamp1 FinVamp1 self-assigned this May 14, 2024
@v-schhabra v-schhabra removed the Area:RM RM task team label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants