Skip to content

Commit

Permalink
Speedup process iter (don't check for PID reuse) (#2404)
Browse files Browse the repository at this point in the history
No longer make process_iter() check whether PID has been reused. This makes it around 20x times faster on Linux. Also changed Process.is_running() so that it will automatically remove the reused PID from process_iter() internal cache. In addition, also add a new process_iter.cache_clear() API.
  • Loading branch information
giampaolo committed Apr 17, 2024
1 parent 5a3d56b commit 7556e5d
Show file tree
Hide file tree
Showing 5 changed files with 120 additions and 87 deletions.
6 changes: 5 additions & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
*Bug tracker at https://github.com/giampaolo/psutil/issues*

5.9.9 (IN DEVELOPMENT)
6.0.0 (IN DEVELOPMENT)
======================

**Enhancements**

- 2366_, [Windows]: log debug message when using slower process APIs.
- 2375_, [macOS]: provide arm64 wheels. (patch by Matthieu Darbois)
- 2396_: `process_iter()`_ no longer pre-emptively checks whether PIDs have
been reused. This makes `process_iter()`_ around 20x times faster.
- 2396_: a new ``psutil.process_iter.cache_clear()`` API can be used the clear
`process_iter()`_ internal cache.

**Bug fixes**

Expand Down
38 changes: 25 additions & 13 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -928,12 +928,12 @@ Functions

Return an iterator yielding a :class:`Process` class instance for all running
processes on the local machine.
This should be preferred over :func:`psutil.pids()` to iterate over processes
as it's safe from race condition.
This should be preferred over :func:`psutil.pids()` to iterate over
processes, as retrieving info is safe from race conditions.

Every :class:`Process` instance is only created once, and then cached for the
next time :func:`psutil.process_iter()` is called (if PID is still alive).
Also it makes sure process PIDs are not reused.
Cache can optionally be cleared via ``process_iter.clear_cache()``.

*attrs* and *ad_value* have the same meaning as in :meth:`Process.as_dict()`.
If *attrs* is specified :meth:`Process.as_dict()` result will be stored as a
Expand Down Expand Up @@ -963,9 +963,19 @@ Functions
3: {'name': 'ksoftirqd/0', 'username': 'root'},
...}

Clear internal cache::

>>> psutil.process_iter.cache_clear()

.. versionchanged::
5.3.0 added "attrs" and "ad_value" parameters.

.. versionchanged::
6.0.0 no longer checks whether each yielded process PID has been reused.

.. versionchanged::
6.0.0 added ``psutil.process_iter.cache_clear()`` API.

.. function:: pid_exists(pid)

Check whether the given PID exists in the current process list. This is
Expand Down Expand Up @@ -1071,11 +1081,12 @@ Process class

.. note::

the way this class is bound to a process is uniquely via its **PID**.
the way this class is bound to a process is via its **PID**.
That means that if the process terminates and the OS reuses its PID you may
end up interacting with another process.
The only exceptions for which process identity is preemptively checked
(via PID + creation time) is for the following methods:
inadvertently end up querying another process. To prevent this problem
you can use :meth:`is_running()` first.
The only methods which preemptively check whether PID has been reused
(via PID + creation time) are:
:meth:`nice` (set),
:meth:`ionice` (set),
:meth:`cpu_affinity` (set),
Expand All @@ -1087,13 +1098,8 @@ Process class
:meth:`suspend`
:meth:`resume`,
:meth:`send_signal`,
:meth:`terminate`
:meth:`terminate` and
:meth:`kill`.
To prevent this problem for all other methods you can use
:meth:`is_running()` before querying the process or
:func:`process_iter()` in case you're iterating over all processes.
It must be noted though that unless you deal with very "old" (inactive)
:class:`Process` instances this will hardly represent a problem.

.. method:: oneshot()

Expand Down Expand Up @@ -1979,11 +1985,17 @@ Process class
This is reliable also in case the process is gone and its PID reused by
another process, therefore it must be preferred over doing
``psutil.pid_exists(p.pid)``.
If PID has been reused this method will also remove the process from
:func:`process_iter()` internal cache.

.. note::
this will return ``True`` also if the process is a zombie
(``p.status() == psutil.STATUS_ZOMBIE``).

.. versionchanged:: 6.0.0 : automatically remove process from
:func:`process_iter()` internal cache if PID has been reused by another
process.

.. method:: send_signal(signal)

Send a signal to process (see `signal module`_ constants) preemptively
Expand Down
102 changes: 42 additions & 60 deletions psutil/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@
AF_LINK = _psplatform.AF_LINK

__author__ = "Giampaolo Rodola'"
__version__ = "5.9.9"
__version__ = "6.0.0"
version_info = tuple([int(num) for num in __version__.split('.')])

_timer = getattr(time, 'monotonic', time.time)
Expand Down Expand Up @@ -291,11 +291,9 @@ class Process(object): # noqa: UP004
If PID is omitted current process PID (os.getpid()) is used.
Raise NoSuchProcess if PID does not exist.
Note that most of the methods of this class do not make sure
the PID of the process being queried has been reused over time.
That means you might end up retrieving an information referring
to another process in case the original one this instance
refers to is gone in the meantime.
Note that most of the methods of this class do not make sure that
the PID of the process being queried has been reused. That means
that you may end up retrieving information for another process.
The only exceptions for which process identity is pre-emptively
checked and guaranteed are:
Expand All @@ -312,11 +310,8 @@ class Process(object): # noqa: UP004
- terminate()
- kill()
To prevent this problem for all other methods you can:
- use is_running() before querying the process
- if you're continuously iterating over a set of Process
instances use process_iter() which pre-emptively checks
process identity for every yielded instance
To prevent this problem for all other methods you can use
is_running() before querying the process.
"""

def __init__(self, pid=None):
Expand Down Expand Up @@ -384,19 +379,24 @@ def __str__(self):
if self._name:
info['name'] = self._name
with self.oneshot():
try:
info["name"] = self.name()
info["status"] = self.status()
except ZombieProcess:
info["status"] = "zombie"
except NoSuchProcess:
info["status"] = "terminated"
except AccessDenied:
pass
if self._pid_reused:
info["status"] = "terminated + PID reused"
else:
try:
info["name"] = self.name()
info["status"] = self.status()
except ZombieProcess:
info["status"] = "zombie"
except NoSuchProcess:
info["status"] = "terminated"
except AccessDenied:
pass

if self._exitcode not in (_SENTINEL, None):
info["exitcode"] = self._exitcode
if self._create_time is not None:
info['started'] = _pprint_secs(self._create_time)

return "%s.%s(%s)" % (
self.__class__.__module__,
self.__class__.__name__,
Expand Down Expand Up @@ -436,7 +436,7 @@ def __hash__(self):

def _raise_if_pid_reused(self):
"""Raises NoSuchProcess in case process PID has been reused."""
if not self.is_running() and self._pid_reused:
if self._pid_reused or (not self.is_running() and self._pid_reused):
# We may directly raise NSP in here already if PID is just
# not running, but I prefer NSP to be raised naturally by
# the actual Process API call. This way unit tests will tell
Expand Down Expand Up @@ -599,19 +599,24 @@ def parents(self):

def is_running(self):
"""Return whether this process is running.
It also checks if PID has been reused by another process in
which case return False.
It also checks if PID has been reused by another process, in
which case it will remove the process from `process_iter()`
internal cache and return False.
"""
if self._gone or self._pid_reused:
return False
try:
# Checking if PID is alive is not enough as the PID might
# have been reused by another process: we also want to
# verify process identity.
# Process identity / uniqueness over time is guaranteed by
# (PID + creation time) and that is verified in __eq__.
# have been reused by another process. Process identity /
# uniqueness over time is guaranteed by (PID + creation
# time) and that is verified in __eq__.
self._pid_reused = self != Process(self.pid)
return not self._pid_reused
if self._pid_reused:
# remove this PID from `process_iter()` internal cache
_pmap.pop(self.pid, None)
raise NoSuchProcess(self.pid)
return True
except ZombieProcess:
# We should never get here as it's already handled in
# Process.__init__; here just for extra safety.
Expand Down Expand Up @@ -1463,10 +1468,7 @@ def process_iter(attrs=None, ad_value=None):
Every new Process instance is only created once and then cached
into an internal table which is updated every time this is used.
Cached Process instances are checked for identity so that you're
safe in case a PID has been reused by another process, in which
case the cached instance is updated.
Cache can optionally be cleared via `process_iter.clear_cache()`.
The sorting order in which processes are yielded is based on
their PIDs.
Expand All @@ -1482,8 +1484,6 @@ def process_iter(attrs=None, ad_value=None):

def add(pid):
proc = Process(pid)
if attrs is not None:
proc.info = proc.as_dict(attrs=attrs, ad_value=ad_value)
pmap[proc.pid] = proc
return proc

Expand All @@ -1502,38 +1502,20 @@ def remove(pid):
for pid, proc in ls:
try:
if proc is None: # new process
yield add(pid)
else:
# use is_running() to check whether PID has been
# reused by another process in which case yield a
# new Process instance
if proc.is_running():
if attrs is not None:
proc.info = proc.as_dict(
attrs=attrs, ad_value=ad_value
)
yield proc
else:
yield add(pid)
proc = add(pid)
if attrs is not None:
proc.info = proc.as_dict(attrs=attrs, ad_value=ad_value)
yield proc
except NoSuchProcess:
remove(pid)
except AccessDenied:
# Process creation time can't be determined hence there's
# no way to tell whether the pid of the cached process
# has been reused. Just return the cached version.
if proc is None and pid in pmap:
try:
yield pmap[pid]
except KeyError:
# If we get here it is likely that 2 threads were
# using process_iter().
pass
else:
raise
finally:
_pmap = pmap


process_iter.cache_clear = lambda: _pmap.clear() # noqa
process_iter.cache_clear.__doc__ = "Clear process_iter() internal cache."


def wait_procs(procs, timeout=None, callback=None):
"""Convenience function which waits for a list of processes to
terminate.
Expand Down
11 changes: 11 additions & 0 deletions psutil/tests/test_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -1375,13 +1375,24 @@ def test_reused_pid(self):
subp = self.spawn_testproc()
p = psutil.Process(subp.pid)
p._ident = (p.pid, p.create_time() + 100)

list(psutil.process_iter())
self.assertIn(p.pid, psutil._pmap)
assert not p.is_running()
# make sure is_running() removed PID from process_iter()
# internal cache
self.assertNotIn(p.pid, psutil._pmap)

assert p != psutil.Process(subp.pid)
msg = "process no longer exists and its PID has been reused"
ns = process_namespace(p)
for fun, name in ns.iter(ns.setters + ns.killers, clear_cache=False):
with self.subTest(name=name):
self.assertRaisesRegex(psutil.NoSuchProcess, msg, fun)

self.assertIn("terminated + PID reused", str(p))
self.assertIn("terminated + PID reused", repr(p))

self.assertRaisesRegex(psutil.NoSuchProcess, msg, p.ppid)
self.assertRaisesRegex(psutil.NoSuchProcess, msg, p.parent)
self.assertRaisesRegex(psutil.NoSuchProcess, msg, p.parents)
Expand Down
50 changes: 37 additions & 13 deletions psutil/tests/test_system.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@
# ===================================================================


class TestProcessAPIs(PsutilTestCase):
def test_process_iter(self):
class TestProcessIter(PsutilTestCase):
def test_pid_presence(self):
self.assertIn(os.getpid(), [x.pid for x in psutil.process_iter()])
sproc = self.spawn_testproc()
self.assertIn(sproc.pid, [x.pid for x in psutil.process_iter()])
Expand All @@ -71,24 +71,40 @@ def test_process_iter(self):
p.wait()
self.assertNotIn(sproc.pid, [x.pid for x in psutil.process_iter()])

# assert there are no duplicates
def test_no_duplicates(self):
ls = [x for x in psutil.process_iter()]
self.assertEqual(
sorted(ls, key=lambda x: x.pid),
sorted(set(ls), key=lambda x: x.pid),
)

with mock.patch(
'psutil.Process', side_effect=psutil.NoSuchProcess(os.getpid())
):
self.assertEqual(list(psutil.process_iter()), [])
with mock.patch(
'psutil.Process', side_effect=psutil.AccessDenied(os.getpid())
):
with self.assertRaises(psutil.AccessDenied):
list(psutil.process_iter())
def test_emulate_nsp(self):
list(psutil.process_iter()) # populate cache
for x in range(2):
with mock.patch(
'psutil.Process.as_dict',
side_effect=psutil.NoSuchProcess(os.getpid()),
):
self.assertEqual(
list(psutil.process_iter(attrs=["cpu_times"])), []
)
psutil.process_iter.cache_clear() # repeat test without cache

def test_process_iter_w_attrs(self):
def test_emulate_access_denied(self):
list(psutil.process_iter()) # populate cache
for x in range(2):
with mock.patch(
'psutil.Process.as_dict',
side_effect=psutil.AccessDenied(os.getpid()),
):
with self.assertRaises(psutil.AccessDenied):
list(psutil.process_iter(attrs=["cpu_times"]))
psutil.process_iter.cache_clear() # repeat test without cache

def test_attrs(self):
for p in psutil.process_iter(attrs=['pid']):
self.assertEqual(list(p.info.keys()), ['pid'])
# yield again
for p in psutil.process_iter(attrs=['pid']):
self.assertEqual(list(p.info.keys()), ['pid'])
with self.assertRaises(ValueError):
Expand All @@ -113,6 +129,14 @@ def test_process_iter_w_attrs(self):
self.assertGreaterEqual(p.info['pid'], 0)
assert m.called

def test_cache_clear(self):
list(psutil.process_iter()) # populate cache
assert psutil._pmap
psutil.process_iter.cache_clear()
assert not psutil._pmap


class TestProcessAPIs(PsutilTestCase):
@unittest.skipIf(
PYPY and WINDOWS, "spawn_testproc() unreliable on PYPY + WINDOWS"
)
Expand Down

0 comments on commit 7556e5d

Please sign in to comment.