Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use file locking to protect selfcheck state file #6954

Closed
chrahunt opened this issue Aug 31, 2019 · 3 comments · Fixed by #6879
Closed

Don't use file locking to protect selfcheck state file #6954

chrahunt opened this issue Aug 31, 2019 · 3 comments · Fixed by #6879
Labels
auto-locked Outdated issues that have been locked by automation type: enhancement Improvements to functionality

Comments

@chrahunt
Copy link
Member

chrahunt commented Aug 31, 2019

What's the problem this feature will solve?

There are several issues around file locking that have been filed over the years, specifically related to:

  1. Underlying OS/filesystem does not support hardlinks as used by the file lock (Pip Issues on Windows + ReFS #2993, infinite loop after pip install on Android API 24 #5322, pip does not work on Haiku #6761)
  2. Lingering lock files and/or lock files in an inconsistent state can cause pip to hang when attempting to acquire the lock (some of Infinite loop on pip when lockfile can't acquire a lock #3532, Pip locks up after failing to download a package due to network error. #5034)
  3. lockfile uses hostname when creating its unique name, which can result in invalid paths when hostname includes a / (pip hangs, lockfile can't acquire a lock, forward-slash in hostname #6938)

Describe the solution you'd like

  1. Write a selfcheck state file per-prefix, to remove the need to read and then write the file within a lock
  2. Write the file atomically (write to a separate tmp file and then move into place) to avoid partial writes if the process is killed

This will satisfy the linked issues and help us progress on #4766 to remove lockfile entirely.

Alternative Solutions

  1. Switch to MkdirLockFile as currently used in the HTTP cache - the downside of this approach is that it is not backwards-compatible, so we would need to use a separate file to track the information for modern pip versions. If we would need to use a separate file anyway, we might as well go one step further to progress Moving off lockfile #4766.

Additional context

@pfmoore
Copy link
Member

pfmoore commented Aug 31, 2019

Write a selfcheck state file per-prefix

Where would this go? If it's within site-packages, it will have problems with permissions (the system site-packages is often read-only) but if it's elsewhere, how will it get cleared up? For example, I routinely use pew mktmpenv to create throwaway virtual environments - I would not want selfcheck files remaining on my system for every one of these.

@chrahunt
Copy link
Member Author

For context, moving to per-prefix selfcheck state files is implemented by #6855 (I have added this to the original issue).

Where would this go?

In #6855, it is <cache dir>/selfcheck/<sha224 of absolute prefix path>

  1. The total path length is exactly the same as for any of our HTTP cache files
  2. The prefix path itself is now in the contained JSON object so we can still tell where it comes from.

how will it get cleared up?

Great question. There is nothing for it in #6855, but we could envision either an ad-hoc activity supported as part of #4685, or a task that removes any selfcheck state files where the prefix path no longer exists. The task could be done by pip when it is updating its state file. I would prefer to implement this after we get rid of the file locking.

@pfmoore
Copy link
Member

pfmoore commented Aug 31, 2019

I would prefer to implement this after we get rid of the file locking.

I guess this is reasonable, as it's in the cache directory (which is already essentially a repository of opaque "clutter" from the POV of the user. It's not like the state files are a space issue. I've never really thought about it, but I guess we don't have any form of cache housekeeping at the moment, so this is really just a further case of that.

I raised #6956 to track the question of whether we need some form of cache housekeeping.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Oct 15, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants