feat(security): add SecurityContext to recordings #1188

andrewazores · 2022-11-01T15:24:55Z

Related to #760
Fixes #1409

Depends on #1338

github-actions · 2022-11-11T05:13:53Z

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-16682bada52d2d2485548838b272f7c5e35ca6c7 sh smoketest.sh

github-actions · 2022-11-14T16:32:06Z

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-77c1b14338fa5f6fbb470ebafd1ab055103877e4 sh smoketest.sh

github-actions · 2022-12-01T20:36:42Z

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-41d6dd23f9025551495cbf635018692c697f9735 sh smoketest.sh

andrewazores · 2022-12-01T21:59:50Z

Current issues:

Trying to download an active recording seems to reliably produce a 0-byte file on the first attempt, and then the proper file on subsequent attempts.
Redeploying Cryostat while a sample application is still live the whole time results in security context errors when trying to perform API requests on the sample application. Restarting the sample application fixes it. Seems like the reverse lookup is probably failing.

Haven't been able to construct a scenario where there are actually users and targets across different namespaces and with different permissions to the namespaces, but for the current single-namespace deployment scenario this looks like a transparent change (as expected) other than the minor bugs listed above.

andrewazores · 2022-12-01T22:10:54Z

@tthvo @maxcao13 this will be ready for proper review soon I think. This needs to also be tested with cryostatio/cryostat-web#707 . There are some basic instructions on that PR about how to do so, but it's pretty standard development workflow.

Please give it a try and see if you can catch any more broken cases that I haven't run into yet. I can also talk through the high-level design here and how it is intended to solve the multi-namespace/multi-tenant security concerns.

I'll fix the unit tests soon as well, I broke them in one of the last commits where I made things fail faster if the correct request security context could not be determined.

github-actions · 2022-12-02T20:33:42Z

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-25c4b2d4ecf38866f61c5879bfe5a6c7f57f9311 sh smoketest.sh

andrewazores · 2022-12-02T20:37:31Z

Tests seem flakey, it looks like something might not be working properly or cleaning up properly perhaps. There is also at least one itest that I have @Disabled for now since it's broken in a way that I haven't figured out yet.

maxcao13 · 2022-12-03T00:00:36Z

Manually testing the PR on ocp and it seems I can't find anything so far in a normal workflow that's been broken, but I will continue to do so after the weekend.

Looks good so far, I think @tthvo and I would definitely benefit from a walkthrough or writeup on how your changes are able to handle multi-namespace security concerns.

andrewazores · 2022-12-05T15:41:01Z

So the concept is centered around that new SecurityContext interface I added. It's an interface because the actual implementation may be different for different platforms, much like the AuthManager. The explanation below mostly assumes deployment in OpenShift for brevity, but at the end I'll tie back how this defaults back to essentially the current behaviour for other scenarios.

The idea is that currently, on OpenShift, Cryostat only deploys into a single namespace, and every user who accesses Cryostat must already have some level of access to that namespace and the other resources (deployments, pods, endpoints) within. This has been baked into Cryostat and its integration with OpenShift RBAC for a long time. For other deployments that don't use the OpenShiftAuthManager then there is currently no notion of a thing like a namespace, but the implementation should be easily extensible enough to support this in the future or for other platform implementations.

So, the goal is to extend the OpenShift platform implementation to support deployment into one namespace but talking to targets across other namespaces. If Cryostat is deployed into Namespace Z and there are target applications in Namespaces A and B, then recordings captured from Target A1 should somehow only be accessible to users who have permissions in Namespace A. For active recordings on presently-visible targets this is pretty easy - we know about the target, we know which Namespace we have observed it in, and so we can ask OpenShift if the auth token associated with the request has the correct permissions in Namespace A. This is already well-supported in Cryostat today.

The difficulty comes from archived recordings. Here we take the data out of Target A and write it to a file on the PVC that Cryostat controls. Before this PR, this operation loses some critical information. We still know the JMX service URL of the target that this data was collected from, but we don't know the Namespace that the target was located in. If we are lucky that information may actually be embedded within the hostname of the JMX service URL but we can't necessarily rely on that. The Security Context is meant to be a small additional data packet that can contain this kind of information. In the OpenShift case, the required information is just that Namespace name, so the OpenShiftSecurityContext implementation internally has a Map but the only key-value pair actually used at the moment is to record the namespace. The AuthManager interface has the new contextFor(AbstractNode | ServiceRef) methods which produce a Security Context for either of those two types. Either way, these two things must be located somewhere in the deployment tree. We also know that the KubeApiPlatformClient already tags ServiceRefs with the AnnotationKey.NAMESPACE within the annotations.cryostat map when it discovers targets, so to determine which Namespace a given ServiceRef is in we can just check that property. To determine the Namespace of some AbstractNode there is a simple trick. If it's a TargetNode then there is an embedded ServiceRef that we can check directly. If it is an EnvironmentNode we have a choice - either ascend back up the tree toward the root until we encounter a Namespace node (which may be the node we're already looking at), or descend down the tree until we encounter a TargetNode and then check that same property. This last way works because if the node we were given is "in between" the Namespace and the target, ex. it's a Deployment, then we know that every possible target leaf node we find below this node must be under the same Namespace. This does break if the given node is between the root and a Namespace, ex. it's a plugin realm or it's the whole universe (the root), but in that case there is no reasonable Namespace to select for permissions modelling. Currently the implementation defaults to using Cryostat's own deployment Namespace in that case.

Okay, all that said, we already had a way to determine the Namespace that a target belongs to, and now we have a Security Context that can hold that information. That Security Context is added as a field of the recording Metadata object. I applied a Gson trick so that this field gets included when this metadata is serialized and we are going to write it to a file for preserving the metadata on disk, but excluded when we are serializing it to include in an API response. The metadata already gets copied and preserved along with the recording JFR data when an active recording is copied to the archives, so now the Security Context also follows the recording data around.

Now we have that information attached to both active and archived recordings. The last thing that needs to be done is to make sure we actually check the requesting user's permissions against that Security Context. This is again implemented by the AuthManager. It's pretty simple and slots in well with the existing system. We already know how to check if the given user token has permissions to Cryostat's Namespace, so now we just need to check instead if the user token has permissions to the Namespace recorded in the Security Context instead. API requests (v1, v2, and GraphQL fetchers/mutators) now all implement an additional method that returns the correct Security Context for the request given the current request parameters/context. So for a mutator that starts a new recording on a target, the security context is the context for that target's ServiceRef. For a handler that streams the JFR data of an archived recording, the security context is the security context stored on disk within that archived recording's metadata. These are the security contexts that are checked by the AuthManager to ensure the currently requesting user has the right permissions (already implemented) within the required namespace.

So, what about API requests that don't touch any target or archived recording? Those don't have any explicit namespacing. Things like simply pinging Cryostat on /health/liveness, or downloading web-client assets to display in the browser, or adding JMX credentials to the keyring. This is what SecurityContext.DEFAULT represents - in OpenShift it represents the Namespace that Cryostat itself is deployed into. To add new JMX credentials the user must be able to CREATE_CREDENTIALS within this Namespace, for example, since it's Cryostat itself that the user is operating on. The other AuthManager implementations, like BasicAuthManager, always return SecurityContext.DEFAULT on the contextFor calls, and simply do a no-op/always-pass on any Security Context. This ends up behaving the same as things are now without the Security Contexts at all.

Lastly, there is one security hole in this implementation that I just figured out while typing this. Automated Rules allow permissions leaking across Namespaces which could end up looking like a privilege escalation. Users need the permissions CREATE_RULE, READ_TARGET, CREATE_RECORDING, UPDATE_RECORDING, READ_TEMPLATE to create rules, but right now these are given the SecurityContext.DEFAULT. So a user may create a rule and only have these permissions within Cryostat's deployment Namespace and nowhere else. But, when Cryostat processes this rule, it will use its own permissions to act upon the rule and start recordings which may include targets in other Namespaces where the user does not have their own permissions.

I will think and experiment with this some more. I'm not sure what to do about this at face value. Rules should probably be able to apply across namespaces. We could include a list of namespaces along with the rule definition and check that the user has permissions in all of those namespaces at rule creation time, but it's possible that the user loses those permissions later on in the external RBAC system and then the rule "should" also lose permissions, but it would not. Maybe that's OK and we require the admin user (whoever removed the first user's permissions) to also manually go and clean up the rules. Or, since we already have a specific CREATE_RULE permission, maybe this specific permission should be mapped to something cluster-scoped in the OpenShift RBAC so that a user must have a sort of superuser-level of permissions in order to create any rules at all. Any thoughts on this @ebaron? I think the cluster-scoped CREATE_RULE mapping probably works well and is pretty easy to implement all around. Maybe there is something we can do to have privileged and unprivileged rules where privileged rules require cluster-level permissions and apply to any namespace, whereas unprivileged rules must be scoped to a single namespace at rule creation time.

On the bright side, the targets and recordings still fall into the Security Context system described above, and so users would still need the appropriate permissions within the correct namespaces to actually be able to access any of the JFR data that would result from the rule. But, an unprivileged user should not be able to use rules to start recordings on targets, especially since recordings can incur performance overhead penalties depending on the selected event template.

src/main/java/io/cryostat/MainModule.java

src/main/java/io/cryostat/discovery/DiscoveryStorage.java

src/main/java/io/cryostat/net/openshift/OpenShiftAuthManager.java

maxcao13 · 2022-12-05T19:26:26Z

src/main/java/io/cryostat/net/openshift/OpenShiftAuthManager.java

+        } else {
+            ns = ((OpenShiftSecurityContext) securityContext).getNamespace();
+        }
+        // FIXME remove


remove? Or maybe logger.trace?

Maybe setting it to trace makes sense. I'll leave it as-is for now since it's handy during development.

src/main/java/io/cryostat/net/openshift/OpenShiftSecurityContext.java

maxcao13 · 2022-12-05T19:45:45Z

src/main/java/io/cryostat/net/openshift/OpenShiftSecurityContextDeserializer.java

+            JsonElement json, Type typeOfT, JsonDeserializationContext context)
+            throws JsonParseException {
+        if (!(auth.get() instanceof OpenShiftAuthManager)) {
+            // FIXME actually deserialize, don't make this assumption


Should these be fixed?

Yes, I'll get around to that later. I think I'll need to add some logic or a migration script somewhere to upgrade existing stored metadata to include security contexts too.

src/main/java/io/cryostat/net/web/http/api/v2/TargetRecordingOptionsListGetHandler.java

src/main/java/io/cryostat/net/web/http/api/v2/graph/AllArchivedRecordingsFetcher.java

…n resource claim to determine security context

…s determined from resource claim URL

Signed-off-by: Andrew Azores <aazores@redhat.com>

github-actions · 2023-03-21T22:04:45Z

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-2cf3796b698507ee23e92672197ab2acc032c293 sh smoketest.sh

ebaron · 2023-03-24T18:09:53Z

I've been able to track down why the OAuth grant is failing for multiple namespaces.

Here is the OAuthClient derived from the Service Account in namespace c:

metadata:
  name: system:serviceaccount:c:clustercryostat-sample
additionalSecrets:
  - <service_account_token>
redirectURIs:
  - https://clustercryostat-sample-c.apps.example.com
grantMethod: prompt
scopeRestrictions:
  - literals:
      - user:info
      - user:check-access
      - user:list-scoped-projects
      - user:list-projects
  - clusterRole:
      roleNames:
        - '*'
      namespaces:
        - c
      allowEscalation: true

This is the error from ValidateScopeRestrictions when trying to create a scoped token for namespaces a, b, and c:

error: '[role:cryostat-operator-oauth-client:a not found in [user:info user:check-access user:list-scoped-projects user:list-projects], role:cryostat-operator-oauth-client:a does not use an approved namespace, role:cryostat-operator-oauth-client:b not found in [user:info user:check-access user:list-scoped-projects user:list-projects], role:cryostat-operator-oauth-client:b does not use an approved namespace]'
errorCauses:
  - error: '[role:cryostat-operator-oauth-client:a not found in [user:info user:check-access user:list-scoped-projects user:list-projects], role:cryostat-operator-oauth-client:a does not use an approved namespace]'
    errorCauses:
      - error: role:cryostat-operator-oauth-client:a not found in [user:info user:check-access user:list-scoped-projects user:list-projects]
      - error: role:cryostat-operator-oauth-client:a does not use an approved namespace
  - error: '[role:cryostat-operator-oauth-client:b not found in [user:info user:check-access user:list-scoped-projects user:list-projects], role:cryostat-operator-oauth-client:b does not use an approved namespace]'
    errorCauses:
      - error: role:cryostat-operator-oauth-client:b not found in [user:info user:check-access user:list-scoped-projects user:list-projects]
      - error: role:cryostat-operator-oauth-client:b does not use an approved namespace

Looking at the OAuthClient, it has a scope restriction requiring the role scopes be in c. The docs indicate this is by design:

You can use a service account as a constrained form of OAuth client. Service accounts can request only a subset of scopes that allow access to some basic user information and role-based power inside of the service account’s own namespace

It seems like in order to get a scoped token to work for multiple namespaces, we can't use the service account as an OAuth client. We would have to manage our own OAuthClient(s): https://docs.openshift.com/container-platform/4.12/authentication/configuring-oauth-clients.html#oauth-register-additional-client_configuring-oauth-clients.

andrewazores · 2023-03-26T21:15:10Z

Nice digging, that's great information to have on hand. I think that definitely pushes me over the edge of saying that this PR and #1409 will be pushed to 2.4.0, since it expands the scope of work required to get this functionality properly working. I will put this PR on the backburner again until after 2.3.0 is out the door.

andrewazores added the feat New feature or request label Nov 1, 2022

mergify bot added the safe-to-test label Nov 1, 2022

andrewazores force-pushed the security-context branch 3 times, most recently from d4f6023 to a3033a4 Compare November 2, 2022 13:51

andrewazores changed the title ~~[WIP] feat(security): add SecurityContext to recordings~~ feat(security): add SecurityContext to recordings Nov 2, 2022

andrewazores force-pushed the security-context branch 6 times, most recently from d0d0814 to 16682ba Compare November 11, 2022 04:59

andrewazores force-pushed the security-context branch from 16682ba to 77c1b14 Compare November 14, 2022 16:16

andrewazores force-pushed the security-context branch 2 times, most recently from ccd276c to 41d6dd2 Compare December 1, 2022 20:21

andrewazores mentioned this pull request Dec 1, 2022

feat(api): include sourceTarget details in download requests cryostatio/cryostat-web#707

Closed

7 tasks

andrewazores force-pushed the security-context branch 2 times, most recently from c3a3c73 to 25c4b2d Compare December 2, 2022 19:57

andrewazores force-pushed the security-context branch from 25c4b2d to bfadf54 Compare December 5, 2022 16:15

maxcao13 reviewed Dec 5, 2022

View reviewed changes

andrewazores force-pushed the security-context branch from bfadf54 to c913f6a Compare December 5, 2022 21:31

andrewazores added 25 commits March 21, 2023 17:36

expect username 'anonymous'

5ffdcd3

handle required 'contexts' field when creating rules

5817108

ensure rules are validated even when created from JSON

e25bf25

fix handling of security context for custom targets

7a31456

add TODO

4b0059e

rename test

460ba45

add a vertx hack to reverse lookup a requesthandler from an auth toke…

5b33f6c

…n resource claim to determine security context

itest fixup

459ad9c

remove extra params from token request form again, now that context i…

a98b1d2

…s determined from resource claim URL

cleanup

0e36922

fixup archived downloads

7ef8a8b

set log level to debug

54d15cd

rebase fix

470bbea

partial implementation of context-filtered queries

1236a46

typo fix

c997a01

rebase fix

24c9965

Signed-off-by: Andrew Azores <aazores@redhat.com>

remove unused field

89a38f5

attempting to request oauth scopes for all target namespaces

903a27d

remove duplicate adapter registration

b57e49c

remove redundant injection

9486bf7

Signed-off-by: Andrew Azores <aazores@redhat.com>

revert to use ApiException

4407755

Signed-off-by: Andrew Azores <aazores@redhat.com>

minor refactor

0bab57e

rebase fixup formatting

1d48a39

fix up MatchExpression handlers with securitycontexts

aaa7ff2

remove unnecessary stubbing

2cf3796

Signed-off-by: Andrew Azores <aazores@redhat.com>

andrewazores force-pushed the security-context branch from 8fc929b to 2cf3796 Compare March 21, 2023 21:50

ebaron mentioned this pull request May 5, 2023

[Story] Manage our own OAuthClient cryostatio/cryostat-operator#579

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(security): add SecurityContext to recordings #1188

feat(security): add SecurityContext to recordings #1188

andrewazores commented Nov 1, 2022 •

edited

github-actions bot commented Nov 11, 2022

github-actions bot commented Nov 14, 2022

github-actions bot commented Dec 1, 2022

andrewazores commented Dec 1, 2022

andrewazores commented Dec 1, 2022 •

edited

github-actions bot commented Dec 2, 2022

andrewazores commented Dec 2, 2022

maxcao13 commented Dec 3, 2022

andrewazores commented Dec 5, 2022 •

edited

maxcao13 Dec 5, 2022

andrewazores Dec 5, 2022

maxcao13 Dec 5, 2022

andrewazores Dec 5, 2022

github-actions bot commented Mar 21, 2023

ebaron commented Mar 24, 2023 •

edited

andrewazores commented Mar 26, 2023

feat(security): add SecurityContext to recordings #1188

Are you sure you want to change the base?

feat(security): add SecurityContext to recordings #1188

Conversation

andrewazores commented Nov 1, 2022 • edited

github-actions bot commented Nov 11, 2022

github-actions bot commented Nov 14, 2022

github-actions bot commented Dec 1, 2022

andrewazores commented Dec 1, 2022

andrewazores commented Dec 1, 2022 • edited

github-actions bot commented Dec 2, 2022

andrewazores commented Dec 2, 2022

maxcao13 commented Dec 3, 2022

andrewazores commented Dec 5, 2022 • edited

maxcao13 Dec 5, 2022

Choose a reason for hiding this comment

andrewazores Dec 5, 2022

Choose a reason for hiding this comment

maxcao13 Dec 5, 2022

Choose a reason for hiding this comment

andrewazores Dec 5, 2022

Choose a reason for hiding this comment

github-actions bot commented Mar 21, 2023

ebaron commented Mar 24, 2023 • edited

andrewazores commented Mar 26, 2023

andrewazores commented Nov 1, 2022 •

edited

andrewazores commented Dec 1, 2022 •

edited

andrewazores commented Dec 5, 2022 •

edited

ebaron commented Mar 24, 2023 •

edited