Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a portable mechanism for processes within container to obtain their image and container ids #1105

Open
lpgc opened this issue Apr 29, 2021 · 35 comments

Comments

@lpgc
Copy link

lpgc commented Apr 29, 2021

There are a number of use cases where the ability to portably obtain a container's image identity and its instance identity.

The most obvious is for logging; in larger systems where there are a significant # of container instances and image versions it is useful for processes therein to be able to "tag" their log output with both identities which as a tuple uniquely identify a particular instance from all its peers.

Other use cases, involve platform runtime serviceability components (e.g in the Java platform) where this information can also be used by the platform in order to correlate multiple forensic artifacts that may be generated over the lifetime of a particular container version(s) and instance(s).

This could simply be provided by requiring that the r/t expose those identifiers to processes therein by establishing a pair of standard environment variables such as OCI_IMAGE_ID and OCI_INSTANCE_ID or similar.

@mtrmac
Copy link

mtrmac commented May 13, 2021

OTOH this goes against the idea of isolating containers from the system. A common unprivileged container has no business knowing where on the host its files are located (or even if they are located on the host), and the easiest definitions of image/container IDs would probably point to specific host paths for the image store and per-container mounts.

My impression of the logging use cases is that they both can, and for better security should, be handled by a node-level log forwarder/collector, not by each container annotating its own output. Doing this in a collector is:

  • more consistent (does not require container/application changes)
  • more secure (a compromised container can not affect the value)
  • potentially more efficient (a collector could annotate the full identity along with other metadata on container creation, and then only reference it, e.g. using only a container ID, on subsequent log entries)

If this needs to exist at all, I’d prefer standardizing the format and semantic guarantees (e.g. is the ID required to be stable / required to be different across reboots? across hosts in what kind of domain?), but only making them available if the user deploying the container explicitly opts in.

@lpgc
Copy link
Author

lpgc commented May 13, 2021

"A common unprivileged container has no business knowing where on the host its files are located (or even if they are located on the host), and the easiest definitions of image/container IDs would probably point to specific host paths for the image store and per-container mounts."

I completely agree with your remark regardind container isolation, however I am not proposing exposing any location information whatsoever, I am only proposing exposing the image digest value and the container instance id, and not any host (or other) path information.

Where the container filesystem is located on a host is not useful for any of the use cases, the identity of the image, and an instance of it are, and while an external logger could potentially inject these values, this makes the task of gathering all
the possible artifacts that a containerized r/t is extremely onerous and costly and is not practical for a platform such as
Java (or any other language r/t for that matter) when the container ecosystem could simply expose the image digest and
instance id to the container itself, in the same way other environmental information is injectable into containers today.

@mtrmac
Copy link

mtrmac commented May 13, 2021

At least in one implementation an image (config) digest directly points to a node path /var/lib/containers/storage/overlay-images/$digest/, and the container ID directly points to a /var/lib/containers/storage/overlay-containers/$digest.

Of course actually exploiting that would require a sandbox breakout, but it’s a piece of the puzzle.

@lpgc
Copy link
Author

lpgc commented May 13, 2021

just to be clear I am proposing exposing the value of:

  • the "digest" object found in the OCI image manifest config comnponent
  • the "id" object described in the runtime spec container "state" definition...

@sudo-bmitch
Copy link

Another option, instead of environment variables, would be mounting a file inside the container. That file could be built in tmpfs, mounted read-only to a well known location like /run/oci-metadata, and use a format like json that allows more values to be added in the future without breaking existing functionality. The other field that I'd find useful would be labels on the image or container, which allows other build and run time information to be propagated into the container.

And I think we all agree, anything we do to implement this should only be on the container side (and associated image), no host level details like paths, hostname, or IP address of the host running the container should be visible from inside the container.

@lpgc
Copy link
Author

lpgc commented May 13, 2021

Another option, instead of environment variables, would be mounting a file inside the container. That file could be built in tmpfs, mounted read-only to a well known location like /run/oci-metadata, and use a format like json that allows more values to be added in the future without breaking existing functionality. The other field that I'd find useful would be labels on the image or container, which allows other build and run time information to be propagated into the container.

And I think we all agree, anything we do to implement this should only be on the container side (and associated image), no host level details like paths, hostname, or IP address of the host running the container should be visible from inside the container.

I really like this proposal, (I think its also "compatible" with the mechanism employed by the k8s downwards API).

I would also +1 your suggestion regarding adding the image tags to this metadata, since those also are known to, and significant, to the components and tools that form the container ecosystem, I had actually wondered about those also.

@sudo-bmitch
Copy link

Related is this old issue from Docker: moby/moby#8427

@mtrmac
Copy link

mtrmac commented May 19, 2021

Of course actually exploiting that would require a sandbox breakout, but it’s a piece of the puzzle.

Compare GHSA-c3xm-pvg7-gh7r where an attacker benefits from knowing the pod ID in Kubernetes.

@XSAM
Copy link

XSAM commented Dec 30, 2021

Another option, instead of environment variables, would be mounting a file inside the container. That file could be built in tmpfs, mounted read-only to a well known location like /run/oci-metadata, and use a format like json that allows more values to be added in the future without breaking existing functionality. The other field that I'd find useful would be labels on the image or container, which allows other build and run time information to be propagated into the container.

I like this one.

I am working on observability. Sometimes, it is hard to solve a problem on the container infrastructure, says containerd, without finding the buggy container first. Having a container id is really helpful by identifying the container. And, we can correlate a service with the underlying infrastructure, so we know exactly the place where the errors come from even within a massively distributed system, like on a Kubernetes cluster.

Kubernetes has the downwards API that can cast Pods' metadata to container env variables, but it cannot cast a container id; it can only cast something like the pod name. However, the pod name only provides a little help in identifying the container since the pod name is not unique, may be reused, on k8s, like the pods created by StatefulSet. And, to find a mapping from the container name to container id, we need to search the pod schedule logs or watch the k8s API. There are other ways like deploying a sidecar that can access the k8s API or the container daemon to fetch the container id, but it is a kind of wasting resources (We just want a container id), and it brings security concerns. Therefore, these approaches are not perfect and only work on k8s. What about containers running on the bare metal?

I do find a way to fetch container id within a container by reading the cgroup file /proc/self/cgroup. However, it is an undocumented behavior of container runtime. It is unstable and may change in the future, just like this approach is not going to work with cgroup v2. Using an undefined and unstable method to fetch container id makes me feel unsafe and upset. open-telemetry/opentelemetry-go#2418

I would like to help to settle a standard way to fetch container id within the container. But I am not sure about anyone from the community is working on this. Anything I can do to make it happen?

@imliuda
Copy link

imliuda commented Feb 18, 2022

we are facing the same problem, we want to collect coredump in containers, and some jvm logs and metrics, we need to distinct different lifecycle of a single container. we need to know the container id before starting our busyness process, then i can set jvm logs, and metrics to a path with container id suffix.

@sudo-bmitch
Copy link

@opencontainers/runtime-spec-maintainers PTAL

@AkihiroSuda
Copy link
Member

As there is no standard for the "image ID" and the "container ID", I'd suggest using the Kubernetes downward API: https://kubernetes.io/docs/concepts/workloads/pods/downward-api/

The downward API doesn't seem supporting the image ID, though, but I guess it is open to negotiation if somebody needs it.
Anyway, Mutating Admission Webhooks should work for injecting the image ID to a custom annotation that is accessible via the downward API.

Non-Kubernetes engines may also opt-in to follow the similar convention.

@mitar
Copy link

mitar commented Jan 19, 2023

Related issue on Kubernetes side: kubernetes/kubernetes#80346

@AkihiroSuda
Copy link
Member

There is also a patch to define the "container ID" on the kernel side for auditing

@tianon
Copy link
Member

tianon commented Jan 19, 2023

To clarify a little bit the hesitation from the runtime spec maintainers here (hopefully if I'm off-base the maintainers who disagree with me will pipe up!), the spec today does not currently have any concept of image ID or even an image at all (it's closer to a very souped-up chroot spec -- we only have a rootfs and it's assumed to be unpacked, layered, etc outside the scope of the runtime-spec itself). Similarly, we also don't have any consistent concept of a "container ID" (for the same reasons).

To put this in more practical terms, the runtime spec implementation is runc. The data being requested here comes from higher-level orchestrators like containerd which help set up the spec bundle for passing to runc, but they're the place this data lives, not runc itself. So, for us to implement something like this in the spec, we're essentially expanding the scope of the spec to include constraints on how containerd and friends are expected to use runc (and which data of their own they're expected to provide to it, even if they don't have anything that maps nicely to said data -- having an image-less container orchestration platform that's still built on top of runc is not very far-fetched).

@svrnm
Copy link

svrnm commented Jan 19, 2023

@sudo-bmitch thanks for your guidance during the call:-)

@AkihiroSuda, thanks for your fast response. Using the kubernetes downward API is another option I am pursuing right now, @mitar raised a KEP a while back for the imageID (although I am more interested in the containerID) and I want to get this conversation going again. Although this does only fix the issue for kubernetes and not for all the other setups out there.

To add a little bit of context: We are currently trying to get container.id detection available across OpenTelemetry SDKs to enable correlation between infra and application telemetry, some PRs/issues on that:

Overall there are 2 "hacks" right now, depending on the cgroup version:

  • Read from /proc/self/cgroup
  • Read from /proc/self/mountinfo

This works for some, but of course not all, container runtimes, and @XSAM figured out that in k8s those IDs might be "incorrect" coming from a prior pause container. So it's a hack, and I was hoping for a reliable way.

To be honest I was not aware that there is no concept of an Image ID / container ID, I just assumed this is a "given", thank for the the clarification @tianon & @AkihiroSuda and it helps me to understand why this is a complicated issue.

Again, the end goal I have in mind is a reliable way to connect application telemetry & container(+other infrastructure) telemetry eventually.

@larry-cable
Copy link

@tianon thanks, makes sense, and I would agree that expanded spec scope is not to be undertaken lightly if at all.

having said that this is an issue that affects both managed runtimes (in containers) such as the JVM (and others) and simple application(s) also.

longer term I believe this needs to be solved in a portable manner such that orchestrators are able to communicate this to containers w/o the container's having to determine which orchestrator is orchestrating.

@larry-cable

This comment was marked as spam.

@svrnm
Copy link

svrnm commented Jan 19, 2023

Without knowing all the inner workings of the runtime spec I am wondering if something like the following works:

  • Write it in a manner that it does not require a specific format to provide identifying details.
  • Use MAY language, so having that information provided is not mandatory and consumers can not expect it to be there.

E.g there MAY be an environment variable OCI_ID that may hold details to identify the container uniquely

Or there MAY be a file /etc/container-id (just making that up for Linux, well aware that this is probably not the right place) holding details that allows identification of the runtime.

Again, apologies, I am not the expert (yet) on choosing the right words but I hope I can bring my idea across.

Edit:

This would provide as a minimum a conancial place where this data may be dropped off by the container engine and where a monitoring/observability solution can find it. Much better than the 2 places we have right now as a hack.

@svrnm
Copy link

svrnm commented Jan 24, 2023

@AkihiroSuda @tianon @sudo-bmitch any thoughts on my last comment? Would there be a way to have such an optional feature for the runtime spec, so implementations are not forced to have it, but those wo do it have a fixed place to put it and consumers (=observability/monitoring) can look for them but will still work without them

@giuseppe
Copy link
Member

I think It should be addressed at a higher level. It shouldn't end up in the runtime-spec that deals only with the lower-level stuff. From the runtime-spec PoV it is either an ENV variable or a bind mount.

@svrnm
Copy link

svrnm commented Jan 24, 2023

I think It should be addressed at a higher level. It shouldn't end up in the runtime-spec that deals only with the lower-level stuff. From the runtime-spec PoV it is either an ENV variable or a bind mount.

thanks. When you say "higher level", what are the potential candidates from your point of view?

@giuseppe
Copy link
Member

I think this kind of logic should go into the container engine, or really anything that calls the OCI runtime (Podman, Docker, containerd, CRI-O...)

@thaJeztah
Copy link
Member

I agree (and agree with most what's said); I don't think this should be defined as part of the runtime spec. I can see the use-case(s) for having more information available from within the container, but care should be taken;

  • Defining it as part of the runtime spec would be expanding the spec's scope. As outlined by some above, there's many possible ways how a container can be created, and there's no general concept of an "id".
  • While there are (undocumented, but well-known) approaches to discover if a process is containerized (and for some cases, discovering an "id"), it's not desirable in all situations to have this information present (either from a security perspective, or from a "conceptual" perspective; should processes be aware they're containerised? and if so: how much information?)
  • From some perspectives, processes should not (have to) be aware that they're containerised (formalizing this in the spec could become a bit of a sliding slope)
  • If this is formalized somewhere, at least this should be an "opt-in" feature.

That said I can think of (and know of) various use-cases where (some amount of) information is useful, and currently there's no formalized / portable way (there are options, such as the aforementioned kubernetes downward API, but depending on the ecosystem). From that perspective, I could see value in some specification for introspection if there's common ground (orchestrated/non-orchestrated), and if such a spec would be flexible / modular / portable to be useful for different ecosystems.

I should probably also mention that various discussions have led to "responsibility of higher level runtimes", which currently isn't a formal concept; at times this feels like a gap in the existing specifications, which makes me wonder if there would be room for a specification around that (may be hard though to find the common ground on that, but perhaps?).

@svrnm
Copy link

svrnm commented Jan 24, 2023

I should probably also mention that various discussions have led to "responsibility of higher level runtimes", which currently isn't a formal concept;

This is exactly the reason why I was hoping to make it part of the runtime spec: changing it here would be a one time thing, whereas changing it in higher levels would require me to chase down each and every engine or orchestration or ... and propose to them to implement something like that (sure it will get easier the moment the "big" ones are onboard).

Monitoring/Observability (and everything adjacent to it like auditing, verifiability, etc.) always come with breaking the design principle of separations of concerns (i.e. the OpenTelemetry spec exactly states that), because what I want to know is, which of my applications running in which container on which cluster on which hardware in which datacenter is the trouble maker.

So, yes, ideally a process should not be aware that they are containerised, but if I want to do monitoring I have no other choice then have this information present, somehow. Of course there are also ways to enrich this information later, but they come with their own (security) issues.

I get that this is expanding the scope of the runtime spec, and I understand that it is a sliding slope, so as said initially, I was hoping for making it possible here, but I am aware that the answer is probably a "No" and this needs to be solved somewhere else.

@notcool11
Copy link

I'd like this expanded to provide everything for the OpenTelemetry container spec. Each otel vendor does data correlation differently, so best to have all the data to be certain it works.

@larry-cable
Copy link

I'd like this expanded to provide everything for the OpenTelemetry container spec. Each otel vendor does data correlation differently, so best to have all the data to be certain it works.

+1

@svrnm
Copy link

svrnm commented Feb 6, 2023

@thaJeztah @giuseppe @tianon @AkihiroSuda any final call on this? It looks like the answer is "out of scope", so I will see to find a way to get this accomplished somewhere in a higher level.

@ohads-MSFT
Copy link

Related issue on Kubernetes side: kubernetes/kubernetes#80346

That issue is talking about image ID, for container ID I think you meant:
kubernetes/kubernetes#50309

@svrnm
Copy link

svrnm commented Feb 27, 2023

Related issue on Kubernetes side: kubernetes/kubernetes#80346

That issue is talking about image ID, for container ID I think you meant: kubernetes/kubernetes#50309

You're right, I quoted the one for imageID because my initial assumption was that both could be treated equally, but I might be wrong here, since imageID is available before the container is created.

@ohads-MSFT
Copy link

ohads-MSFT commented Feb 27, 2023

You're right, I quoted the one for imageID because my initial assumption was that both could be treated equally, but I might be wrong here, since imageID is available before the container is created.

Image ID is also far less interesting, because you can just inject this value yourself - it's part of the pod's static metadata (e.g. kubernetes/kubernetes#80346 (comment))

The container ID on the other hand is completely dynamic, and changes every time the container restarts - which can happen multiple times during the lifetime of the pod (OOM, liveness failure, etc).

@mitar
Copy link

mitar commented Feb 27, 2023

Image ID is also far less interesting, because you can just inject this value yourself - it's part of the pod's static metadata (e.g. kubernetes/kubernetes#80346 (comment))

That is not true. If you use anything except digest-based image ID in your pod specification, then you cannot really know which version of the image was fetched to run your container. Label/tag can point to different images at different times. For debugging it is critical to know which exactly version was running.

@ohads-MSFT
Copy link

That is not true. If you use anything except digest-based image ID in your pod specification, then you cannot really know which version of the image was fetched to run your container. Label/tag can point to different images at different times. For debugging it is critical to know which exactly version was running.

Fair point, but this is a very good reason to do as we do and always use digest-based image IDs :)

@mitar
Copy link

mitar commented Feb 27, 2023

Fair point, but this is a very good reason to do as we do and always use digest-based image IDs :)

If it works for your workflow. But sometimes it is OK to pick the latest image whichever it is, you just want to know which one has been picked.

@svrnm
Copy link

svrnm commented Mar 1, 2023

FYI, I followed the suggestion to raise this with container engines and opened containerd/containerd#8185 with the containerd project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests