Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process_iter(): no longer check whether PIDs have been reused #2396

Open
giampaolo opened this issue Apr 8, 2024 · 0 comments
Open

process_iter(): no longer check whether PIDs have been reused #2396

giampaolo opened this issue Apr 8, 2024 · 0 comments

Comments

@giampaolo
Copy link
Owner

giampaolo commented Apr 8, 2024

Summary

  • OS: all
  • Type: performance

Description

For every process yielded by psutil.process_iter(), internally we check whether the process PID has been reused, in which case we return a "fresh" Process instance. In order to check for PID reuse we are forced to create a new Process instance, retrieve process create_time() and compare it with the original process. Performance wise, it turns out this has a huge (and exponential) cost. This is particularly relevant because process_iter() is typically used to write task manager like apps, where the full process list is retrieved every second. I realized this at work, while writing a process monitor agent that runs on small hardware (a cleaning robot).

By removing the PID reuse check I get a a 21x speedup on a Linux OS with 481 running PIDs:

import time, psutil
started = time.monotonic()
for x in range(1000):
    list(psutil.process_iter())
print(f"completed in {(time.monotonic() - started):.4f} secs")

Current master:
Number of pids: 481. Completed in 5.1079 secs

With PID reuse check removed:
Number of pids: 481. Completed in 0.2419 secs

Repercussions

  • PID reuse is already pre-emptively checked for "write" Process APIs such as kill(), terminate(), nice() (set), etc., so in that sense it won't make any difference and we'll remain safe.
  • There are some Process APIs that are cached: exe(), create_time() and name() (Windows only). In this case, if PID has been reused, the Process instance will keep returning the old value, which doesn't happen with the current (slow) implementation, since process_iter() returns a brand new Process instance.
  • We may clear Process cache on is_running(), but we cannot clear create_time()'s cache, as the old value is necessary to detect PID reusage. This basically means a PID-reused Process instance should just be discarded by process_iter() somehow (but how?).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant