Fix iterable length handling in contrib.concurrent #1473

svank · 2023-05-12T18:10:16Z

contrib.concurrent has functions for running map-like functions in parallel with a progress bar. This PR makes three fixes to how the lengths of the iterables are detected.

If the function provided to process_map or thread_map accepts N arguments, these map functions accept N iterables, with each iterable providing values for one argument. The map function keeps calling the provided function until reaching the end of the shortest iterable.

process_map raises a warning if a long iterable has been provided but a chunksize has not been set. It was looking at the length of the longest iterable, but it should look at the shortest length, as that's what sets the number of iterations.

Prior behavior:

In [7]: from tqdm.contrib.concurrent import process_map
   ...: from time import sleep
   ...: from random import random
   ...: def f(*args):
   ...:     sleep(random() * 3)
   ...: process_map(f, range(2000), range(2))
<ipython-input-7-cc99b55bb53a>:6: TqdmWarning: Iterable length 2000 > 1000 but `chunksize` is not set. This may seriously degrade multiprocess performance. Set `chunksize=1` or more.
  process_map(f, range(2000), range(2))
  0%|                                          | 2/2000 [00:01<26:27,  1.26it/s]
Out[7]: [None, None]

Note the warning about a length-2000 iterable, despite only running two iterations. With this PR, the warning is not raised in this instance.

Note also that the progress bar shows an expected total of 2000 iterations, despite ending after 2. This is because _executor_map was only looking at the first iterable to set the expected total. This PR changes this behavior to use the length of the shortest iterable.

New behavior:

In [1]: from tqdm.contrib.concurrent import process_map
   ...: from time import sleep
   ...: from random import random
   ...: def f(*args):
   ...:     sleep(random() * 3)
   ...: process_map(f, range(2000), range(2))
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00,  1.23s/it]
Out[1]: [None, None]

This PR also improves how these map functions handle iterables with no length. (One pattern would be having one variable and one fixed argument, and calling process_map(my_function, range(10), itertools.repeat(fixed_value)).) Because of the default length of zero returned by operator.length_hint (and with this PR's emphasis on the shortest length), no-length iterables caused the expected total to be 0. The second commit in this PR ignores those lengths when calculating the expected total.

Before:

In [20]: from tqdm.contrib.concurrent import process_map
    ...: from time import sleep
    ...: from random import random
   ...: from itertools import repeat
    ...: def f(*args):
    ...:     sleep(random() * 3)
    ...: process_map(f, range(2000), range(2), repeat(1))
2it [00:02,  1.11s/it]
Out[20]: [None, None]

After:

In [2]: from tqdm.contrib.concurrent import process_map
   ...: from time import sleep
   ...: from random import random
   ...: from itertools import repeat
   ...: def f(*args):
   ...:     sleep(random() * 3)
   ...: process_map(f, range(2000), range(2), repeat(1))
100%|█████████████████████████████████████████████| 2/2 [00:01<00:00,  1.95it/s]
Out[2]: [None, None]

svank added 2 commits May 12, 2023 11:29

Fix length detection in contrib.concurrent

4997c59

Handle no-length iterables in contrib.concurrent

dd37e6c

svank requested a review from casperdcl as a code owner May 12, 2023 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix iterable length handling in contrib.concurrent #1473

Fix iterable length handling in contrib.concurrent #1473

svank commented May 12, 2023

Fix iterable length handling in contrib.concurrent #1473

Are you sure you want to change the base?

Fix iterable length handling in contrib.concurrent #1473

Conversation

svank commented May 12, 2023