Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

contenthash: implement proper Linux symlink semantics #4896

Merged
merged 5 commits into from
Jun 6, 2024

Commits on Jun 6, 2024

  1. contenthash: switch recursive rootPath implementation to be iterative

    This is based on the github.com/cyphar/filepath-securejoin
    implementation. Since we need to resolve all components when doing
    lookups, doing it recursively serves little purpose (other than
    complicating the implementation).
    
    There was also a bug in how symlinks containing "/../" in the target
    name were handled, though this is a more specific issue than the general
    issues with ".." that the rest of the contenthash code has.
    
    Ref: 9bf5431 ("contenthash: fix issues on symlinks on parent paths")
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    531f71a View commit details
    Browse the repository at this point in the history
  2. contenthash: implement proper Linux symlink semantics for getFollowLinks

    This patch is part of a series which fixes the symlink resolution
    semantics within BuildKit.
    
    You cannot implement symlink resolution on Linux naively using
    path.Join. A correct implementation requires tracking the current path
    and applying each new component individually. This implementation is
    loosely based on github.com/cyphar/filepath-securejoin.
    
    Things to note:
    
     * The previous implementation of getFollowLinks actually only resolved
       symlinks in parent components of the path, leading to some callers to
       have to implement resolution manually (and incorrectly) by calling
       getFollowLinks several times. In addition to being incorrect and
       somewhat difficult to follow, it also lead to the ELOOP limit being
       much higher than 255 (while some callers used getFollowLinksWalk,
       most used getFollowLinks which reset the limit for each iteration).
    
       So, add getFollowParentLinks to allow for callers to decide which
       behaviour they need. getFollowLinks now follows all links
       (correctly).
    
     * The trailing-slash-is-significant behaviour in the cache (dir vs
       dir header) needs to be handled specially because on Linux there is
       no distinction between "a/" and "a" (assuming a is not a symlink,
       that is) and so filepath-securejoin's implementation didn't care
       about trailing slashes. The previous implementation hid the trailing
       path behaviour purely in the splitKey() implementation, making the
       need for this quite subtle.
    
     * The previous implementation was recursive, which in theory would
       allow you to find some paths slightly more quickly (if you find a
       valid ancestor you don't need to check above it) at the cost of
       making some lookups more expensive (a path with an invalid ancestor
       very early on in the path).
    
       However, implementing the correct lookup algorithm recursively proved
       to be quite difficult. It is possible to implement a similar
       optimisation (try to find the first non-symlink parent component and
       iterate from there), this complicates the implementation a fair
       amount and it doesn't seem clear that the performance tradeoff is a
       benefit in general.
    
       Ultimately, cache lookups are quite fast and so there probably isn't
       a practical performance difference between approaches.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    2659945 View commit details
    Browse the repository at this point in the history
  3. contenthash: implement proper Linux symlink semantics for needsScan

    This patch is part of a series which fixes the symlink resolution
    semantics within BuildKit.
    
    Since we now have a working implementation of symlink resolution in
    getFollowLinks, we can add a callback after each component is resolved
    so that we can track which components were and were not found during the
    resolution. Because getFollowLinks resolves the components in-order, we
    can use this to tell needsScan whether the resolver found a real
    (non-symlink) ancestor of the requested path in the cache.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    f724d6f View commit details
    Browse the repository at this point in the history
  4. contenthash: unify "follow" and trailing-symlink handling for Checksum

    This patch is part of a series which fixes the symlink resolution
    semantics within BuildKit.
    
    Previously, the concept of the follow flag had different meanings in
    various parts of the checksum codepath. FollowLinks is effectively
    O_NOFOLLOW, but the implementation in getFollowLinks was actually more
    like RESOLVE_NO_SYMLINKS. This was masked by the fact that
    checksumFollow would implement the O_NOFOLLOW behaviour (incorrectly),
    but checksumFollow would call checksumNoFollow (which would follow
    symlinks in path components by setting follow=true for getFollowLinks).
    
    It is much easier to simply remove these layers of indirection and unify
    the meaning of FollowLinks across all of the code. This means that the
    old follow flag is no longer needed.
    
    This also means that we can now remove the incorrect symlink resolution
    logic in (*cacheContext).checksumFollow() and move the followTrailing
    logic to (*cacheContext).checksum(), as well as removing
    getFollowParentLinks(). Since this removes some redundant re-checksum
    loops, we need to add followTrailing logic to scanPath() so that final
    symlink components result in the correct directory being scanned
    properly.
    
    The only user of (*cacheContext).checksum(follow=false) was
    (*cacheContext).includedPaths() which appeared to be simply using this
    as an optimisation (since the path being walked already had its parent
    path resolved). Having two easily-confused boolean flags for an
    optimisation that is probably not necessary (getFollowLinks already does
    a fast check to see if the original path is in the cache) seemed
    unnecessary, so just keep followTrailing.
    
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    e5171cd View commit details
    Browse the repository at this point in the history
  5. contenthash: add tests for non-lexical symlinks

    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    ee43730 View commit details
    Browse the repository at this point in the history