Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(AzureDNS): Add support for Workload Identity #5570

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
65 changes: 64 additions & 1 deletion pkg/issuer/acme/dns/azuredns/azuredns.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ package azuredns
import (
"context"
"fmt"
"os"
"strings"

"github.com/go-logr/logr"
Expand Down Expand Up @@ -71,6 +72,35 @@ func NewDNSProviderCredentials(environment, clientID, clientSecret, subscription
}, nil
}

// getFederatedSPT prepares an SPT for a Workload Identity-enabled setup
func getFederatedSPT(env azure.Environment, options adal.ManagedIdentityOptions) (*adal.ServicePrincipalToken, error) {
// NOTE: all related environment variables are described here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html
oauthConfig, err := adal.NewOAuthConfig(env.ActiveDirectoryEndpoint, os.Getenv("AZURE_TENANT_ID"))
if err != nil {
return nil, fmt.Errorf("failed to retrieve OAuth config: %v", err)
}

jwt, err := os.ReadFile(os.Getenv("AZURE_FEDERATED_TOKEN_FILE"))
if err != nil {
return nil, fmt.Errorf("failed to read a file with a federated token: %v", err)
}

// AZURE_CLIENT_ID will be empty in case azure.workload.identity/client-id annotation is not set
// Also, some users might want to use a different MSI for a particular DNS zone
// Thus, it's important to offer optional ClientID overrides
clientID := os.Getenv("AZURE_CLIENT_ID")
if options.ClientID != "" {
clientID = options.ClientID
}

token, err := adal.NewServicePrincipalTokenFromFederatedToken(*oauthConfig, clientID, string(jwt), env.ResourceManagerEndpoint)
if err != nil {
return nil, fmt.Errorf("failed to create a workload identity token: %v", err)
}

return token, nil
}

func getAuthorization(env azure.Environment, clientID, clientSecret, subscriptionID, tenantID string, ambient bool, managedIdentity *cmacme.AzureManagedIdentity) (*adal.ServicePrincipalToken, error) {
if clientID != "" {
logf.Log.V(logf.InfoLevel).Info("azuredns authenticating with clientID and secret key")
Expand All @@ -84,7 +114,7 @@ func getAuthorization(env azure.Environment, clientID, clientSecret, subscriptio
}
return spt, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to realise that the clientID parameter will be empty in the case of ambient credentials, but that in that case, the clientID may be supplied via the managedIdentity parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the same process as for pod managed identities :)

logf.Log.V(logf.InfoLevel).Info("No ClientID found: authenticating azuredns with managed identity (MSI)")
logf.Log.V(logf.InfoLevel).Info("No ClientID found: attempting to authenticate with ambient credentials (Azure Workload Identity or Azure Managed Service Identity, in that order)")
if !ambient {
return nil, fmt.Errorf("ClientID is not set but neither `--cluster-issuer-ambient-credentials` nor `--issuer-ambient-credentials` are set. These are necessary to enable Azure Managed Identities")
}
Expand All @@ -96,6 +126,39 @@ func getAuthorization(env azure.Environment, clientID, clientSecret, subscriptio
opt.IdentityResourceID = managedIdentity.ResourceID
}

// Use Workload Identity if present
if os.Getenv("AZURE_FEDERATED_TOKEN_FILE") != "" {
spt, err := getFederatedSPT(env, opt)
if err != nil {
return nil, err
}

// adal does not offer methods to dynamically replace a federated token, thus we need to have a wrapper to make sure
// we're using up-to-date secret while requesting an access token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate on this?
I looked at the default refresh function and it appears to use the RefreshToken that accompanies the latest AccessToken to update the AccessToken before it expires.
Why is that not desirable? Add some further explanation to the comment.

I also went looking at the azure-workload-identity msal-go example for inspiration and noticed a coupe of things:

  1. They used to use autorest.NewBearerAuthorizerCallback with a callback to load the K8S service account token from the filesystem. Does that solve the problem of missing "methods to dynamically replace a federated token" more elegantly than this?
  2. That example has recently been updated to to use azidentity and azsecrets sdk : "removing the dependency on autorest and switching to use track 2 sdks". Would it make sense to do that here too? I note that the autorest library is deprecated. In fact I think it's been suggested in the previous PR that we should do that first:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, thanks for the review! :)

Please elaborate on this?
I looked at the default refresh function and it appears to use the RefreshToken that accompanies the latest AccessToken to update the AccessToken before it expires.
Why is that not desirable? Add some further explanation to the comment.

Well, as I understand, there's no concept of RefreshToken for Workload Identity. An app is always supposed to exchange a federated token for an access token. Before working on the PR for cert-manager, I prepared a similar patch for external-dns (sadly, stays unreviewed), and while doing so, I dumped both ServicePrincipalToken (spt in code) and an access token (returned by newSPT.Token()), in both cases RefreshToken is empty.

1. They used to use autorest.NewBearerAuthorizerCallback with a callback to load the K8S service account token from the filesystem. Does that solve the problem of missing "methods to dynamically replace a federated token" more elegantly than this?
Just by looking through the code, I'm not sure if it offers renewal. Something to be tested for more than 24h (AAD token will expire after a day).
I like my current code for the fact that it doesn't go too deep into implementation internals of the library, it rather acts as a simple wrapper. Though, I'm not a software engineer (SRE), so I'd trust your taste if you suggest to go into a different direction.

2. That example has recently been updated to to Azure/azure-workload-identity#639 : "removing the dependency on autorest and switching to use track 2 sdks". Would it make sense to do that here too? I note that the autorest library is deprecated. In fact I think it's been suggested in the previous PR that we should do that first:
#5452

Indeed, the library will be maintained only till April next year, so, in the long run, cert-manager should definitely migrate away from it.
And I saw the PR, but there's no estimate on when it'll get merged, so I hoped that an incremental change like in my PR will make it into the code sooner. And, actually, azure-sdk-for-go will also require a workaround for Workload Identity at the moment as the native supported will go beta only in December or January: Azure/azure-sdk-for-go#15615

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments as requested.

var refreshFunc adal.TokenRefresh = func(context context.Context, resource string) (*adal.Token, error) {
newSPT, err := getFederatedSPT(env, opt)
if err != nil {
return nil, err
}

// Need to call Refresh(), otherwise .Token() will be empty
err = newSPT.Refresh()
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Refresh sends the K8S ServiceAccountToken (JWT) to Azure's Oauth endpoint and gets back an Oauth AccessToken and RefreshToken, right?

Maybe add an additional comment explaining that this results in a roundtrip to the Azure auth API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there's a special label in place (e.g. on a ServiceAccount: azure.workload.identity/use: "true"), a mutation webhook modifies a pod's spec to add a projected volume with a federated token. The path to the token is always the same, it's just a concatenation of these two constants:

	TokenFilePathName             = "azure-identity-token"
	TokenFileMountPath            = "/var/run/secrets/azure/tokens" // #nosec

https://github.com/Azure/azure-workload-identity/blob/231c6be82969efb51f0539397e6ab268c1eb8eca/pkg/webhook/consts.go#L54
To avoid hardcoding, the apps are supposed to find it through AZURE_FEDERATED_TOKEN_FILE env.
The token gets updated once an hour by default.

This token is sent to the Oauth endpoint. Response contains only an AccessToken (valid for 24h), RefreshToken is not present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments as requested.


accessToken := newSPT.Token()

return &accessToken, nil
}

spt.SetCustomRefreshFunc(refreshFunc)

return spt, nil
}

logf.Log.V(logf.InfoLevel).Info("No Azure Workload Identity found: attempting to authenticate with an Azure Managed Service Identity (MSI)")

spt, err := adal.NewServicePrincipalTokenFromManagedIdentity(env.ServiceManagementEndpoint, &opt)
if err != nil {
return nil, fmt.Errorf("failed to create the managed service identity token: %v", err)
Expand Down
122 changes: 122 additions & 0 deletions pkg/issuer/acme/dns/azuredns/azuredns_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,16 @@ this directory.
package azuredns

import (
"encoding/json"
"io"
"net/http"
"net/http/httptest"
"os"
"testing"
"time"

"github.com/Azure/go-autorest/autorest/adal"
"github.com/Azure/go-autorest/autorest/azure"
v1 "github.com/cert-manager/cert-manager/pkg/apis/acme/v1"
"github.com/cert-manager/cert-manager/pkg/issuer/acme/dns/util"
"github.com/stretchr/testify/assert"
Expand Down Expand Up @@ -77,3 +83,119 @@ func TestInvalidAzureDns(t *testing.T) {
_, err := NewDNSProviderCredentials("invalid env", "cid", "secret", "", "", "", "", util.RecursiveNameservers, false, &v1.AzureManagedIdentity{})
assert.Error(t, err)
}

func populateFederatedToken(t *testing.T, filename string, content string) {
t.Helper()

f, err := os.Create(filename)
if err != nil {
assert.FailNow(t, err.Error())
}

if _, err := io.WriteString(f, content); err != nil {
assert.FailNow(t, err.Error())
}

if err := f.Close(); err != nil {
assert.FailNow(t, err.Error())
}
}

func TestGetAuthorizationFederatedSPT(t *testing.T) {
// Create a file that will be used to store a federated token
f, err := os.CreateTemp("", "")
if err != nil {
assert.FailNow(t, err.Error())
}
defer os.Remove(f.Name())

// Close the file to simplify logic within populateFederatedToken helper
if err := f.Close(); err != nil {
assert.FailNow(t, err.Error())
}

// The initial federated token is never used, so we don't care about the value yet
// Though, it's a requirement from adal to have a non-empty value set
populateFederatedToken(t, f.Name(), "random-jwt")

// Prepare environment variables adal will rely on. Skip changes for some envs if they are already defined (=live environment)
// Envs themselves are described here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html
if os.Getenv("AZURE_TENANT_ID") == "" {
t.Setenv("AZURE_TENANT_ID", "fakeTenantID")
}

if os.Getenv("AZURE_CLIENT_ID") == "" {
t.Setenv("AZURE_CLIENT_ID", "fakeClientID")
}

t.Setenv("AZURE_FEDERATED_TOKEN_FILE", f.Name())

t.Run("token refresh", func(t *testing.T) {
// Basically, we want one token to be exchanged for the other (key and value respectively)
tokens := map[string]string{
"initialFederatedToken": "initialAccessToken",
"refreshedFederatedToken": "refreshedAccessToken",
}

ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if err := r.ParseForm(); err != nil {
assert.FailNow(t, err.Error())
}

w.Header().Set("Content-Type", "application/json")
receivedFederatedToken := r.FormValue("client_assertion")
accessToken := adal.Token{AccessToken: tokens[receivedFederatedToken]}

if err := json.NewEncoder(w).Encode(accessToken); err != nil {
assert.FailNow(t, err.Error())
}

// Expected format: http://<server>/<tenant-ID>/oauth2/token?api-version=1.0
assert.Contains(t, r.RequestURI, os.Getenv("AZURE_TENANT_ID"), "URI should contain the tenant ID exposed through env variable")

assert.Equal(t, os.Getenv("AZURE_CLIENT_ID"), r.FormValue("client_id"), "client_id should match the value exposed through env variable")
}))
defer ts.Close()

ambient := true
env := azure.Environment{ActiveDirectoryEndpoint: ts.URL, ResourceManagerEndpoint: ts.URL}
managedIdentity := &v1.AzureManagedIdentity{ClientID: ""}

spt, err := getAuthorization(env, "", "", "", "", ambient, managedIdentity)
assert.NoError(t, err)

for federatedToken, accessToken := range tokens {
populateFederatedToken(t, f.Name(), federatedToken)
assert.NoError(t, spt.Refresh(), "Token refresh failed")
assert.Equal(t, accessToken, spt.Token().AccessToken, "Access token should have been set to a value returned by the webserver")
}
})

t.Run("clientID overrides through managedIdentity section", func(t *testing.T) {
managedIdentity := &v1.AzureManagedIdentity{ClientID: "anotherClientID"}

ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if err := r.ParseForm(); err != nil {
assert.FailNow(t, err.Error())
}

w.Header().Set("Content-Type", "application/json")
accessToken := adal.Token{AccessToken: "abc"}

if err := json.NewEncoder(w).Encode(accessToken); err != nil {
assert.FailNow(t, err.Error())
}

assert.Equal(t, managedIdentity.ClientID, r.FormValue("client_id"), "client_id should match the value passed through managedIdentity section")
}))
defer ts.Close()

ambient := true
env := azure.Environment{ActiveDirectoryEndpoint: ts.URL, ResourceManagerEndpoint: ts.URL}

spt, err := getAuthorization(env, "", "", "", "", ambient, managedIdentity)
assert.NoError(t, err)

assert.NoError(t, spt.Refresh(), "Token refresh failed")
})
}