Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch an edge case in expand._assert_local() #3595

Merged
merged 2 commits into from Oct 14, 2022
Merged

Catch an edge case in expand._assert_local() #3595

merged 2 commits into from Oct 14, 2022

Conversation

mssalvatore
Copy link
Contributor

@mssalvatore mssalvatore commented Sep 18, 2022

Summary of changes

Using str.startswith() for sandboxing file access has an edge case where someone can access files outside the root directory. For example, consider the case where the root directory is "/home/user/my-package" but some secrets are stored in "/home/user/my-package-secrets". Evaluating a check that "/home/user/my-package-secrets".startswith("/home/user/my-package") will return True, but the statement's intention is that no file outside of "/home/user/my-package" can be accessed.

Using pathlib.Path.resolve() and pathlib.Path.parents eliminates this edge case.

See https://salvatoresecurity.com/preventing-directory-traversal-vulnerabilities-in-python/ for more details.

Caveats

  1. If there is an infinite loop in the directory structure, pathlib.Path.resolve() will raise a RuntimeError.
  2. pathlib.Path.resolve() resolves symlinks, whereas the old code using os.pathlib.abspath() did not.

I am not intimately familiar with setuptools and could use some guidance as to whether or not either of these two caveats would cause an issue. My guess would be that the first would not impact users and the second might, but probably won't. If either of the above caveats is an issue, I can provide an alternate implementation using os.path.abspath(), though that comes with its own caveats.

Pull Request Checklist

  • Changes have tests
  • News fragment added in changelog.d/. - This PR should not change the way users interact with setuptools, with the possible exception of the caveats above.

Using str.startswith() has an edge case where someone can access files
outside the root directory. For example, consider the case where the
root directory is "/home/user/my-package" but some secrets are stored in
"/home/user/my-package-secrets". Evaluating a check that
"/home/user/my-package-secrets".startswith("/home/user/my-package") will
return True, but the statement's intention is that no file outside of
"/home/user/my-package" can be accessed.

Using pathlib.Path.resolve() and pathlib.Path.parents eliminates this
edge case.
@abravalheri
Copy link
Contributor

Thank you very much for working on this, @mssalvatore.

I can see that you have used resolve instead of abspath. Is there a rationale behind this choice?
I believe that this changes a little bit the semantics of the operation right? This would mean that developers cannot use symbolic links in their projects, which could mean that the change is not backwards compatible (it also introduces file-system traversing, instead of string operation, which you noted in your comments can result in recursion problems) ...

@mssalvatore
Copy link
Contributor Author

@abravalheri Usingresolve instead of abspath does indeed change the semantics of the operation. resolve will canonicalize the path (resolving symlinks) and abspath won't. The docstrings for read_files() and a few other functions say that the "function is sandboxed and won't reach anything outside rootdir". For that to be true, strictly speaking, you need resolve instead of abspath.

If it's acceptable or desirable to allow symlinks to be traversed, I can reimplement this using abspath. Just LMK.

@abravalheri
Copy link
Contributor

Hi @mssalvatore thank you very much for having a look on this.

I think it is fair to assume that if the user has a symbolic link, they want it to be considered as a local file. I am also concerned about the backward compatibility...

My opinion is that it would be better to go with a loose interpretation and use abspath in this particular scenario. Unless there is a good reason to do otherwise.

4249da1 uses `pathlib.Path.resolve()` instead of `os.path.abspath()`
to canonicalize path names. `resolve()` resolves symlinks, whereas
`abspath()` does not. `resolve()` can also raise a `RuntimeError` if
infinite loops are discovered while resolving the path. There is some
concern that using `resolve()` would not be backwards compatible. This
commit switches back to `abspath()` but still uses `Path.parents` to
avoid the edge case. See PR #3595 for more details.
@mssalvatore
Copy link
Contributor Author

@abravalheri I've pushed a commit to use abspath() and still handle the edge case.

@abravalheri abravalheri merged commit a55b024 into pypa:main Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants