Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use AAD Cluster with non-admin users #25

Closed
jorik opened this issue Mar 10, 2021 · 28 comments · Fixed by #32
Closed

Unable to use AAD Cluster with non-admin users #25

jorik opened this issue Mar 10, 2021 · 28 comments · Fixed by #32
Assignees
Labels
default enhancement New feature or request

Comments

@jorik
Copy link

jorik commented Mar 10, 2021

Hello,

We are trying to use this action with a Service Principle that only has access to a single namespace in our AKS cluster, but we're getting these errors:

Run azure/aks-set-context@v1
Error: ***"error":***"code":"AuthorizationFailed","message":"The client 'xxxx' with object id 'xxxx' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/accessProfiles/listCredential/action' over scope '/subscriptions/xxxx/resourceGroups/xxxx/providers/Microsoft.ContainerService/managedClusters/xxxx/accessProfiles/clusterAdmin' or the scope is invalid. If access was recently granted, please refresh your credentials."***

I can see in the code that the clusterAdmin role is hardcoded. Is there a way to make this configurable? Or is there an alternative action we can use to log in to AKS?

@thesattiraju
Copy link
Contributor

The alternate way is to use azure/cli action with az aks get-credentials -n <cluster-name> -g <resource-group> until this is fixed.

Would you be willing to a raise a PR to fix this btw? You could introduce a new input called admin, default value set to true and if it's set to false get the clusterUser credentials instead

@molinamelendezj
Copy link

@jorik did what @DS-MS recommend there work for you? I'm running into the same issue, but the workaround still doesn't work for me as the azure/cli action runs in a container and actions can't seem to set the kube config context etc

@thesattiraju
Copy link
Contributor

Oh! If that's the case you could consider just using azure/login action and then running a script action.. i.e. run: az aks get-credentials -n <cluster-name> -g <resource-group>

@molinamelendezj
Copy link

@DS-MS yea that is what I also tried. This is what I have tried so far.

- name: Azure Login
  uses: Azure/login@v1
  with:
    creds: "${{ secrets.AZURE_CREDENTIALS }}"
- name: Azure CLI Action
  uses: Azure/cli@1.0.4
  with:
    inlineScript: az aks get-credentials -n <cluster-name> -g <resource-group>

Which runs successfully and produces the following trace.

Run Azure/cli@1.0.4
Starting script execution via docker image mcr.microsoft.com/azure-cli:latest
Merged "<cluster-name>" as current context in /root/.kube/config

az script ran successfully.
cleaning up container...
MICROSOFT_AZURE_CLI_1621367983213_CONTAINER

But as you can see it says cleaning up container... so when the following actions runs I get the error below. Its like the context disappears (I think). I also tried running all of this within one bash script along with kubectl using the cli but of course it hangs as it waits for user input to establish the auth.

- uses: Azure/k8s-deploy@v1.4
  with:
    namespace: '<namespace>'
    manifests: |
        deployment.yaml
        service.yaml
    images: '<contianerstore>.azurecr.io/<image>:v1.0.0'
    kubectl-version: 'latest'
-Error:  Error: Cluster context not set. Use k8ssetcontext action to set cluster context

I can't seem to find any Microsoft example of this scenario. All examples I find assume that the deploying action SP has admin rights to the cluster. We as a enterprise solution are trying to restrict access to the namespace level, as @jorik mentioned to, and have actions from product teams deploy to their namespaces and not other product teams namespaces. As in practice we are using a cluster as a common platform for a set of products.

You mention it might be simpler to just fix the issue on this action, can you provide some more detail. As I think we want to make use of this action.

@thesattiraju
Copy link
Contributor

I meant more like :

- name: Azure Login
  uses: Azure/login@v1
  with:
    creds: "${{ secrets.AZURE_CREDENTIALS }}"

- run: az aks get-credentials -n <cluster-name> -g <resource-group>

- uses: Azure/k8s-deploy@v1.4
  with:
    namespace: '<namespace>'
    manifests: |
        deployment.yaml
        service.yaml
    images: '<contianerstore>.azurecr.io/<image>:v1.0.0'
    kubectl-version: 'latest'

Also this is just a workaround, @Ganeshrockz is currently working on adding this. 😄

@molinamelendezj
Copy link

@DS-MS yes, that is what I tired as well, and I get that context error. I opened #31 for feedback but if @Ganeshrockz is already on it I can close it. Any other help you can give me as a work around?

image

@dnovvak
Copy link

dnovvak commented May 21, 2021

@molinamelendezj
Here is my workaround which also adds KUBECONFIG env variable required by k8s-deploy action:

- name: Azure Login
  uses: azure/login@v1
  with:
    creds: "${{ secrets.AZURE_CREDENTIALS }}"

- name: Set KUBECONFIG env variable
  run: echo "KUBECONFIG=${RUNNER_TEMP}/kubeconfig-$(date +%s)" >> $GITHUB_ENV

- name: Set AKS cluster context
  run: |
    az aks get-credentials -n <cluster-name> -g <resource-group> -f ${KUBECONFIG}

Based on https://github.com/Azure/aks-set-context/blob/releases/v1/src/login.ts#L53.

@molinamelendezj
Copy link

molinamelendezj commented May 21, 2021

@dnovvak your suggestions looked promising, but I get the following error:

W0521 18:21:30.181078    1612 loader.go:221] Config not found: /home/runner/work/_temp/kubeconfig_1621621288

I also adjusted your suggestion as it seems its a _ not a - with the date, but got same error. Then I ran a simple

      - name: List files in runner.temp
        working-directory: ${{ runner.temp }}
        run: find ./ -type f

And confirmed kubeconfig file is not there. I think the Azure/login@v1 doesn't do what azure/aks-set-context@v1's login does also runs in a container and has the admin scope issue, so this will not work. not sure what else to do here. seems @Ganeshrockz is putting in a fix, but not sure it will work, we'll see.

@dnovvak
Copy link

dnovvak commented May 24, 2021

That is really interesting because it works for me.
Instead of error I can see such message from az aks get-credentials command:

Merged "<cluster-name>" as current context in /home/runner/work/_temp/kubeconfig-1621612871

The name of kubeconfig file is not very important here since we use KUBECONFIG as single source of truth for this.

For sure the kubeconfig file should be created by az aks get-credentials (notice -f ${KUBECONFIG} is crucial here, isn't it missing?).

I use non-admin user for CI/CD and have AKS cluster configured with AAD and Azure RBAC.

Here is complete job definition for my case, maybe this will help:

  deploy-to-aks:
    name: Deploy to AKS
    needs: build-image
    runs-on: ubuntu-latest
    permissions:
      contents: read
    env:
      NAMESPACE: foo-ns
      SECRET: my-image-pull-secret
      APP: myApp
      REGISTRY: ghcr.io
    steps:
      - name: Check out the repo
        uses: actions/checkout@v2

      - name: Construct image ref
        run: echo "IMAGE_REF=${REGISTRY}/${GITHUB_REPOSITORY,,}/${APP}:$(cat ${APP}/VERSION)" >> $GITHUB_ENV

      # this step is required for me because of AAD and Azure RBAC integration
      - name: Set up kubelogin for non-interactive login
        run: |
          curl -LO https://github.com/Azure/kubelogin/releases/download/v0.0.9/kubelogin-linux-amd64.zip
          sudo unzip -j kubelogin-linux-amd64.zip -d /usr/local/bin
          rm -f kubelogin-linux-amd64.zip
          kubelogin --version

      - name: Azure Login
        uses: azure/login@v1
        with:
          creds: "${{ secrets.AZURE_CREDENTIALS }}"
      
      - name: Set KUBECONFIG env variable
        run: echo "KUBECONFIG=${RUNNER_TEMP}/kubeconfig-$(date +%s)" >> $GITHUB_ENV

      - name: Set AKS cluster context
        run: |
          az aks get-credentials -n <cluster-name> -g <resurce-group> -f ${KUBECONFIG}
          kubelogin convert-kubeconfig -l azurecli  # this is required for me because of AAD and Azure RBAC integration

      - name: Create image pull secret
        uses: azure/k8s-create-secret@v1
        with:
          container-registry-url: ${{ env.REGISTRY }}
          container-registry-username: ${{ secrets.REGISTRY_USER }}
          container-registry-password: ${{ secrets.REGISTRY_PASSWORD }}
          secret-name: ${{ env.SECRET }}
          namespace: ${{ env.NAMESPACE }}
          arguments: --force true

      - name: Deploy to AKS
        uses: azure/k8s-deploy@v1
        with:
          manifests: |
            ${{ env.APP }}/k8s/configmap.yaml
            ${{ env.APP }}/k8s/deployment.yaml
          images: ${{ env.IMAGE_REF }}
          imagepullsecrets: ${{ env.SECRET }}
          namespace: ${{ env.NAMESPACE }}

@molinamelendezj
Copy link

@dnovvak ! thanks man. This got further now, and we also use non-admin user for CI/CD and have AKS cluster configured with AAD and Azure RBAC so looks like you have help even more than I expected 😄

I need to fix some access to my image store, but looks like the action is connecting and authin' now. you rock. Thanks!

@thesattiraju
Copy link
Contributor

JFYI, @Ganeshrockz has raised this PR to fix this: #32

You can test it out by using ref azure/aks-set-context@ganeshrockz/addUserRoleInput by passing an additional input called admin: false until this is released officially :)

@molinamelendezj
Copy link

molinamelendezj commented May 25, 2021

@DS-MS yup just tried that and work as well (pretty cool, didn't realize I could target it like that!), cool! @Ganeshrockz thanks for the fix, looking forward to its release. Cheers.

@DS-MS and @Ganeshrockz actually spoke to soon. seems my aks-deploy hung. I think there is more to cosider since I'm using AAD and RBAC like @dnovvak, and looks like he had that other workaround using the non-interactive login

image

@thesattiraju thesattiraju added the enhancement New feature or request label May 26, 2021
@Ganeshrockz
Copy link
Contributor

@molinamelendezj Can you try rerunning the workflow again with the action targeting the same branch (azure/aks-set-context@ganeshrockz/addUserRoleInput)?

@molinamelendezj
Copy link

@molinamelendezj Can you try rerunning the workflow again with the action targeting the same branch (azure/aks-set-context@ganeshrockz/addUserRoleInput)?

image

@Ganeshrockz the action won't run now.

@thesattiraju
Copy link
Contributor

@molinamelendezj could you dump the error log here?

@molinamelendezj
Copy link

@DS-MS here it is below.

internal_error.txt

I had to turn on debugging on the workflow, I think this is an unhandled exception that is why I can't see anything. Is there a missing permission we need on our SP for this to work as expected? Also as I mention in my #31 there is a new endpoint and this azure management api endpoint is being deprecated.

@thesattiraju
Copy link
Contributor

@molinamelendezj I was just testing things with the newer API which AKS team has provided; things seem to work seamlessly for me.

I noticed in your earlier logs, kubectl requested you to explicitly authenticate using a device code. Is your cluster integrated with AAD?
If so, I suppose it is working as expected... when you use clusterUser in automation it demands an explicit login, but if you use clusterAdmin it'll by pass that.

One alternative to that is you could provide admin privileges to the SPN you are using within that cluster to help you use it in automation.

@molinamelendezj
Copy link

@DS-MS yes we are using AAD. We do not want to use admin privileges on the SP. we are using the cluster as a shared platform within the enterprise, we don't want product teams being able to deploy to other namespaces either by mistake or on purpose.

@dnovvak's work around above works just fine, would be nice to fix the action to handle the AAD case.

@KatieKoslosky
Copy link

Thank you, thank you, thank you @dnovvak for taking the time to share your solution, it really helped me.

It would be nice if the action could handle this case.

@thesattiraju thesattiraju changed the title Unable to use with non-admin users Unable to use AAD Cluster with non-admin users Jun 15, 2021
@github-actions
Copy link

github-actions bot commented Jul 6, 2021

This issue is idle because it has been open for 14 days with no activity.

@github-actions github-actions bot added the idle Inactive for 14 days label Jul 6, 2021
@preetishmadalia
Copy link

Hi, is this feature merged in v1? I am getting below message :
Warning: Unexpected input(s) 'admin', valid inputs are ['creds', 'resource-group', 'cluster-name']

@github-actions github-actions bot removed the idle Inactive for 14 days label Feb 3, 2022
@OliverMKing
Copy link
Collaborator

I updated this action to use this technique for v2.0.

Hi, is this feature merged in v1? I am getting below message : Warning: Unexpected input(s) 'admin', valid inputs are ['creds', 'resource-group', 'cluster-name']

It was never merged into v1, v2.0 should be used instead.

@preetishmadalia
Copy link

Thanks @OliverMKing. Will try with v2.0. 👍

@bahramr
Copy link

bahramr commented Feb 11, 2022

The action "azure/aks-set-context@v2.0" runs ok with a non-admin service principal but the action that follows "Azure/k8s-deploy@v1.4" fails as it seems like it is trying to do an interactive login:

"(https://github.com/Azure/aks-baseline-automation/runs/5162795080?check_suite_focus=true#step:5:33)31 1658 azure.go:163] Failed to acquire a token: failed acquiring new token: waiting for device code authentication to complete: autorest/adal/devicetoken: Error while retrieving OAuth token: Code Expired
Unable to connect to the server: acquiring a token for authorization header: failed acquiring new token: waiting for device code authentication to complete: autorest/adal/devicetoken: Error while retrieving OAuth token: Code Expired (Client.Timeout exceeded while awaiting headers)
Error: Error: To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code xxxxx to authenticate."

Do we have to use kubelogin somehow to get around this issue: https://docs.microsoft.com/en-us/azure/aks/managed-aad#non-interactive-sign-in-with-kubelogin ?

@OliverMKing
Copy link
Collaborator

OliverMKing commented Feb 24, 2022

@bahramr try adding the following steps from @dnovvak.

      - name: Set up kubelogin for non-interactive login
        run: |
          curl -LO https://github.com/Azure/kubelogin/releases/download/v0.0.9/kubelogin-linux-amd64.zip
          sudo unzip -j kubelogin-linux-amd64.zip -d /usr/local/bin
          rm -f kubelogin-linux-amd64.zip
          kubelogin --version
      - name: Convert kubeconfig
        run: |
          kubelogin convert-kubeconfig -l azurecli

@github-actions
Copy link

This issue is idle because it has been open for 14 days with no activity.

@tjcorr
Copy link

tjcorr commented Jun 2, 2022

The most recent changes @tbarnes94 should fix this but we'll need a new release cut to publish it. The README will need updating as well since the new parameters aren't in v2.0.

@github-actions github-actions bot removed the idle Inactive for 14 days label Jun 3, 2022
@OliverMKing
Copy link
Collaborator

Created a new release. V2.2, V2 (latest in V2), and subsequent releases have this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
default enhancement New feature or request
Projects
None yet