Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix iterable length handling in contrib.concurrent #1473

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

svank
Copy link

@svank svank commented May 12, 2023

contrib.concurrent has functions for running map-like functions in parallel with a progress bar. This PR makes three fixes to how the lengths of the iterables are detected.

If the function provided to process_map or thread_map accepts N arguments, these map functions accept N iterables, with each iterable providing values for one argument. The map function keeps calling the provided function until reaching the end of the shortest iterable.

process_map raises a warning if a long iterable has been provided but a chunksize has not been set. It was looking at the length of the longest iterable, but it should look at the shortest length, as that's what sets the number of iterations.

Prior behavior:

In [7]: from tqdm.contrib.concurrent import process_map
   ...: from time import sleep
   ...: from random import random
   ...: def f(*args):
   ...:     sleep(random() * 3)
   ...: process_map(f, range(2000), range(2))
<ipython-input-7-cc99b55bb53a>:6: TqdmWarning: Iterable length 2000 > 1000 but `chunksize` is not set. This may seriously degrade multiprocess performance. Set `chunksize=1` or more.
  process_map(f, range(2000), range(2))
  0%|                                          | 2/2000 [00:01<26:27,  1.26it/s]
Out[7]: [None, None]

Note the warning about a length-2000 iterable, despite only running two iterations. With this PR, the warning is not raised in this instance.

Note also that the progress bar shows an expected total of 2000 iterations, despite ending after 2. This is because _executor_map was only looking at the first iterable to set the expected total. This PR changes this behavior to use the length of the shortest iterable.

New behavior:

In [1]: from tqdm.contrib.concurrent import process_map
   ...: from time import sleep
   ...: from random import random
   ...: def f(*args):
   ...:     sleep(random() * 3)
   ...: process_map(f, range(2000), range(2))
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00,  1.23s/it]
Out[1]: [None, None]

This PR also improves how these map functions handle iterables with no length. (One pattern would be having one variable and one fixed argument, and calling process_map(my_function, range(10), itertools.repeat(fixed_value)).) Because of the default length of zero returned by operator.length_hint (and with this PR's emphasis on the shortest length), no-length iterables caused the expected total to be 0. The second commit in this PR ignores those lengths when calculating the expected total.

Before:

In [20]: from tqdm.contrib.concurrent import process_map
    ...: from time import sleep
    ...: from random import random
   ...: from itertools import repeat
    ...: def f(*args):
    ...:     sleep(random() * 3)
    ...: process_map(f, range(2000), range(2), repeat(1))
2it [00:02,  1.11s/it]
Out[20]: [None, None]

After:

In [2]: from tqdm.contrib.concurrent import process_map
   ...: from time import sleep
   ...: from random import random
   ...: from itertools import repeat
   ...: def f(*args):
   ...:     sleep(random() * 3)
   ...: process_map(f, range(2000), range(2), repeat(1))
100%|█████████████████████████████████████████████| 2/2 [00:01<00:00,  1.95it/s]
Out[2]: [None, None]

@svank svank requested a review from casperdcl as a code owner May 12, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant