Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsenter: overwrite glibc's internal tid cache on clone() #4247

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Apr 30, 2024

  1. nsenter: overwrite glibc's internal tid cache on clone()

    Since glibc 2.25, the thread-local cache of the current TID is no longer
    updated in the child when calling clone(2). This results in very
    unfortunate behaviour when Go does pthread calls using pthread_self(),
    which has the wrong TID stored.
    
    The "simple" solution is to forcefully overwrite this cached value.
    Unfortunately (and unsurprisingly), the layout of "struct pthread" is
    strictly private and could change without warning.
    
    Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with
    the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as
    long as runc is using glibc, when "runc init" is spawned the child
    process will have a pointer directly to the cached value we want to
    change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later,
    we can simply use prctl(PR_GET_TID_ADDRESS).
    
    For older kernels we need to memory scan the TLS structure
    (pthread_self() is a pointer to the head of the TLS structure). However,
    to avoid false positives we first try known-correct offsets based on the
    current structure layouts. If that fails, we scan the 1K block for any
    fields that might match. When doing the scan, we assume that the first
    field we find that contains the actual TID of the current process is the
    field we want.
    
    Obviously this is all very horrific, and if you are reading this in the
    future, it almost certainly has caused some horrific bug that I did not
    forsee. Sorry about that. As far as I can tell, there is no other
    workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID
    behaviour of glibc in some way. We cannot "just" do a re-exec after
    clone(2) for security reasons.
    
    Sadly, this is all glibc-specific. musl doesn't even allow you to use
    CLONE_CHILD_CLEARTID (and they use a different address for the TID
    anyway). We could do the memory scan and manually overwrite the address
    after clone(2), but we can deal with that in the future if it turns out
    people use non-glibc builds and need this fix.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    e10daeb View commit details
    Browse the repository at this point in the history