-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed DBToken. Made the components uses Managed Identity #3584
base: master
Are you sure you want to change the base?
Conversation
Please rebase pull request. |
4156bcd
to
f89ebdd
Compare
f89ebdd
to
4bd69b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to remove:
rm docs/dbtoken-service.md
- The load balancing rule to allow gateway to talk to the dbtoken service from
pkg/deploy
(should be in templates)
We probably don't need to peer the gateway and RP vnets anymore. We might be able to remove that peering as well. Which would be good for overall security posture.
pkg/database/billing.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
_, err = poller.PollUntilDone(ctx, nil) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error logic handling is changing here. Should it be? Should we only ignore errors of http.StatusConflict
from cosmosdb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this method is to create/update so if there is already something there, it will just make an update which would be nothing if the triggers are same. I ran this multiple times on the same cosmosdb database and it didn't return http.StatusConflict even when the triggers were already there.
pkg/database/billing.go
Outdated
ctx, cancel := context.WithTimeout(ctx, time.Minute*5) | ||
defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an over-arching context assocaited with this function already? is the old method triggerc.Create(ctx, trigger)
blocking vs now we have to poll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we used to get access to create/modify triggers using the keys, however now since we are using managed identities at cosmosdb scope, we cannot do any management operations on triggers(as it being management resource) because for that, the request must go through ARM and not directly to database.
https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-setup-rbac#permission-model
pkg/database/database.go
Outdated
@@ -71,6 +73,22 @@ func NewMasterKeyAuthorizer(ctx context.Context, log *logrus.Entry, token azcore | |||
return cosmosdb.NewMasterKeyAuthorizer(getDatabaseKey(keys, log)) | |||
} | |||
|
|||
func NewTokenAuthorizer(ctx context.Context, log *logrus.Entry, cred azcore.TokenCredential, databaseAccountName string, scopes []string) (cosmosdb.Authorizer, error) { | |||
scopes = append(scopes, fmt.Sprintf("https://%s.documents.azure.com/.default", databaseAccountName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scope is incorrect for government cloud (and any non-Azure Public Cloud)
We should read the url from the env struct we use everywhere to be cloud aware.
https://learn.microsoft.com/en-us/azure/azure-government/compare-azure-government-global-azure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! .. sorry for that, I'll correct it.
@@ -71,6 +73,22 @@ func NewMasterKeyAuthorizer(ctx context.Context, log *logrus.Entry, token azcore | |||
return cosmosdb.NewMasterKeyAuthorizer(getDatabaseKey(keys, log)) | |||
} | |||
|
|||
func NewTokenAuthorizer(ctx context.Context, log *logrus.Entry, cred azcore.TokenCredential, databaseAccountName string, scopes []string) (cosmosdb.Authorizer, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a problem inherently with this function.
The token we request from AAD with the scope is a short-lived credential (~12 hrs max I believe). That means that at some point, the token will expire both from AAD MSI perspective and cosmosdb perspective. As a result, we may need some refreshing of the actual database client in all places where it's used.
Previously, we used the primary database keys which never expire but are rotatable. That's why we didn't need to implement any database client reconstruction due to token expiration. Now we more than likely do.
The best way to test this would be to run an RP for a long time (or get the expiration of the token from this function) and see what happens when you get past the expiration time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The azcore.TokenCredential, it is indeed an object for tokens but it does not statically contains the token. It has a getToken()
method which gives a refreshed token every time we call the method.
There is a token refresh logic in place in this go-cosmosdb code in this PR which cache the token. Every time we make a request, it is check is the current token is expired or 5 mins left to expire, if so it calls getToken()
to get an updated token else use the cached ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible on quieter regions that we might not have a request for over 12 hours? Would that have any effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because everything a request goes to cosmosdb, there's a method Authorize()
which gets called, and this method has the logic to acquire the token if the token is expired. The same logic which we were discussing lately.
/azp run ci,e2e |
Pull request contains merge conflicts. |
Please rebase pull request. |
4bd69b9
to
20a5400
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small comments. Nothing critical. Seems like this PR really needs folks to manually test - is there instructions for how one might do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any changes in this file that aren't formatting? A quick review seemed like there wasn't any meaningful changes to this file which make me think it should be removed from the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I think prettier plugin changed this file. I will remove this changes.
I did remove this portion though:
az keyvault certificate import
--vault-name "$KEYVAULT_PREFIX-dbt"
--name dbtoken-server
--file secrets/localhost.pem >/dev/null
@@ -71,6 +73,22 @@ func NewMasterKeyAuthorizer(ctx context.Context, log *logrus.Entry, token azcore | |||
return cosmosdb.NewMasterKeyAuthorizer(getDatabaseKey(keys, log)) | |||
} | |||
|
|||
func NewTokenAuthorizer(ctx context.Context, log *logrus.Entry, cred azcore.TokenCredential, databaseAccountName string, scopes []string) (cosmosdb.Authorizer, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible on quieter regions that we might not have a request for over 12 hours? Would that have any effect?
globalaccounts storage.AccountsClient | ||
deployments features.DeploymentsClient | ||
groups features.ResourceGroupsClient | ||
// loadbalancers network.LoadBalancersClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just remove this entirely I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove what? line 52 // loadbalancers network.LoadBalancersClient
?
I was not sure about if this is affecting something or not, so I just commented it out for now. I will remove that as well once I have more peer reviews on this.
globalaccounts: storage.NewAccountsClient(_env.Environment(), *config.Configuration.GlobalSubscriptionID, authorizer), | ||
deployments: features.NewDeploymentsClient(_env.Environment(), config.SubscriptionID, authorizer), | ||
groups: features.NewResourceGroupsClient(_env.Environment(), config.SubscriptionID, authorizer), | ||
// loadbalancers: network.NewLoadBalancersClient(_env.Environment(), config.SubscriptionID, authorizer), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just remove this entirely I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing
/azp run ci,e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
f691b3b
to
d507229
Compare
d507229
to
5e3fecd
Compare
Regarding impact for billing service:
|
Which issue this PR addresses:
As part of the ARO-5512 , we needed to disable local auth for cosmosdb. In order to disable local authentication, this PR is to make our components rely on managed identities/entra id rather than keys for authentication/authorizing cosmosdb.
This is currently a draft PR as I will be testing it in the INT env first before going forward, however, reviews are appreciated.
I have manually changed some generated code here. That was because the PR I created to have the generated code changed is not merged yet. Once that is merged, I will generate the code from the go-cosmosdb repo itself just like how we usually do.
Fixes
ARO-7195
ARO-7314