How to update single progress bar in multiprocessing map() #1121

duncanmmacleod · 2017-12-06T14:04:47Z

duncanmmacleod
Dec 6, 2017

[macOS 10.13.1, python 2.7.14 (macports), tqdm 4.19.4 (macports)]

I am struggling to work out how to get a single progress bar to update on every completion from a multiprocessed, mapped function. Consider the following example:

import time
import random
from multiprocessing import Pool
import tqdm

pbar = tqdm.tqdm(total=100)

def myfunc(a):
    time.sleep(random.random())
    pbar.update(1)
    return a ** 2

pool = Pool(2)
pool.map(myfunc, range(100))
pool.close()
pool.join()
pbar.close()

The idea is to have a single progress bar that updates every time a call to myfunc completes anywhere in the pool. However, what I see is that in each process the bar seems to have an independent counter, so I only get to 50% (or thereabouts, because of random()) when the job finishes:

 48%|████████████████████▏                     | 48/100 [00:24<00:21,  2.43it/s]

Is there a clean way to implement a single tqdm progress bar that I can then update from inside one of many child processes?

Apologies if this is already asked/solved.

mdbetancourt · 2017-12-11T02:13:19Z

mdbetancourt
Dec 11, 2017

import time
import random
from multiprocessing import Pool
import tqdm

def myfunc(a):
    time.sleep(random.random())
    pbar.update(1)
    return a ** 2

pool = Pool(2)
for _ in tqdm.tqdm(pool.imap_unordered(myfunc, range(100), total=100):
    pass
pool.close()
pool.join()
pbar.close()

0 replies

casperdcl · 2017-12-12T09:55:05Z

casperdcl
Dec 12, 2017
Maintainer Sponsor

interesting. As @Akhail mentioned you could move the progress logic to outside:

import time
import random
from multiprocessing import Pool
import tqdm

def myfunc(a):
    time.sleep(random.random())
    return a ** 2

pool = Pool(2)
for _ in tqdm.tqdm(pool.imap_unordered(myfunc, range(100)), total=100):
    pass
pool.close()
pool.join()
100%|████████████████████████████| 100/100 [00:23<00:00,  4.20it/s]

but this isn't ideal. There must be a simpler solution.

0 replies

meshiguge · 2017-12-18T09:05:33Z

meshiguge
Dec 18, 2017

but how about if the pool.apply_async was used

for k, task in enumerate(tasks):
        pool.apply_async(task_runner, args=(), callback=task_callback)
pool.close()
pool.join()

0 replies

casperdcl · 2017-12-18T15:37:59Z

casperdcl
Dec 18, 2017
Maintainer Sponsor

of course...

import time
import random
from multiprocessing import Pool
from tqdm import tqdm

def myfunc(a):
    time.sleep(random.random())
    return a ** 2

pool = Pool(2)
'''
for _ in tqdm(pool.imap_unordered(myfunc, range(100)), total=100):
    pass
'''
pbar = tqdm(total=100)
def update(*a):
    pbar.update()
    # tqdm.write(str(a))
for i in range(pbar.total):
    pool.apply_async(myfunc, args=(i,), callback=update)
# tqdm.write('scheduled')
pool.close()
pool.join()

0 replies

meshiguge · 2017-12-19T06:18:21Z

meshiguge
Dec 19, 2017

@casperdcl thanks very much

what is the meaning of total=100 ? is it the tasks number ?
def update(*a): pbar.update() what is *a in this function ?

thanks

0 replies

casperdcl · 2017-12-19T09:05:52Z

casperdcl
Dec 19, 2017
Maintainer Sponsor

total=100 tells tqdm that the total number of items is 100. *a captures the output of myfunc as a tuple, so in this case if you uncomment tqdm.write lines you should see e.g.:

scheduled                                                          (0,)
(1,)
(9,)
(4,)
...
(9216,)
(9025,)
(9604,)
(9409,)
(9801,)
100%|████████████████████████████| 100/100 [00:22<00:00,  3.48it/s]

0 replies

JonnoFTW · 2017-12-21T02:58:17Z

JonnoFTW
Dec 21, 2017

I used the following solution in my multiprocessing solution that parsed multiple files at once:

from multiprocessing import Pool, Process, Manager
from tqdm import tqdm
manager = Manager()
queue = manager.Queue()
def process_file(q, fname):
    last_pos = 0
    with open(fname) as infile:
        for line in infile:
            infile.readline()
            # ... do something with the line
            q.put(infile.tell() - last_pos)
            last_pos = infile.tell()

def show_prog(q, total_bytes):
    prog = tqdm(total=total_bytes, desc="Total", unit='B', unit_scale=True)
    while 1:
        try:
            to_add = q.get(timeout=1)
            prog.n += to_add
            prog.update(0)
            if prog.n >= total_bytes:
                break
        except:
            continue
pool = Pool(processes=5)
manager = Manager()
queue = manager.Queue()
total_bytes = 0
for f in files:
    total_bytes += os.stat(f).st_size
    pool.apply_async(process_file, args=(queue, f))
progress = Process(target=show_prog, args=(queue, final_bytes))
progress.start()
pool.close()
pool.start()
pool.join()
progress.join()

0 replies

casperdcl · 2017-12-21T15:45:48Z

casperdcl
Dec 21, 2017
Maintainer Sponsor

To answer the original question,

from multiprocessing import Pool


def myfunc(a):
    return a ** 2


N = 100
pool = Pool(2)
res = pool.map(myfunc, range(N))
pool.close()
pool.join()

becomes:

from multiprocessing import Pool
from tqdm import tqdm


def myfunc(a):
    return a ** 2


N = 100
pbar = tqdm(total=N)
res = [None] * N  # result list of correct size

def wrapMyFunc(arg):
    return arg, myfunc(arg)

def update((i, ans)):
    # note: input comes from async `wrapMyFunc`
    res[i] = ans  # put answer into correct index of result list
    pbar.update()

pool = Pool(2)
for i in range(N):
    pool.apply_async(wrapMyFunc, args=(i,), callback=update)
pool.close()
pool.join()
pbar.close()

0 replies

duncanmmacleod · 2018-04-18T10:35:26Z

duncanmmacleod
Apr 18, 2018
Author

@casperdcl, I would like to reopen this issue, because I'm still seeing the same issue, but in a slightly different situation.

I have a need to parallelise calling a function that can't be directly imported (because its defined on-the-fly, for example). This means (AFAIK) that I can't use multiprocessing.Pool, which means I can't use a callback function.

See the following (somewhat contrived) example:

#!/usr/bin/env python

import time
import random
from operator import itemgetter
from multiprocessing import (Process, Queue)

import tqdm

NINPUTS = 100
NPROC = 4


def process(func, inq, outq, bar):
    """Read from input `Queue`, and put result of `func` into output `Queue`
    """
    while True:
        idx, arg = inq.get()
        if idx is None:
            break
        outq.put((idx, func(arg)))
        bar.update()


def run():
    """Parallelise inline function
    """
    inputs = list(range(NINPUTS))
    bar = tqdm.tqdm(total=NINPUTS)

    # function to be parallelised
    def myfunc(a):
        time.sleep(random.random())
        return a ** 2

    inq = Queue()
    outq = Queue()

    # create processing pool
    pool = [Process(target=process, args=(myfunc, inq, outq, bar)) for
            _ in range(NPROC)]
    for proc in pool:
        proc.daemon = True
        proc.start()

    for x in enumerate(inputs):  # fill queue with inputs
        inq.put(x, block=False)

    for _ in range(NPROC):  # sentinel to signal empty
        inq.put((None, None))

    # get results and close pool
    results = [outq.get() for _ in range(NINPUTS)]
    for proc in pool:
        proc.join()

    bar.close()

    print([x for i, x in sorted(results, key=itemgetter(0))])


run()

I end up with an output like:

 31%|█████████████                             | 31/100 [00:13<00:44,  1.55it/s]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801]

Can you (or anyone else) help me out for this case as well?

[Thanks for the previous solution, which was very helpful]

0 replies

chengs · 2018-04-18T11:53:40Z

chengs
Apr 18, 2018
Collaborator

You may consider only use tqdm where you collect the results
Add tqdm logic at this line
results = [outq.get() for _ in range(NINPUTS)]

0 replies

chengs · 2018-04-18T11:55:42Z

chengs
Apr 18, 2018
Collaborator

Or change it to imap_unordered with chunksize defined

0 replies

duncanmmacleod · 2018-04-18T12:23:27Z

duncanmmacleod
Apr 18, 2018
Author

Or change it to imap_unordered with chunksize defined

@chengs, can you elaborate on how this would work?

0 replies

duncanmmacleod · 2018-04-18T12:43:01Z

duncanmmacleod
Apr 18, 2018
Author

@chengs, never mind that last request, the suggestion to add the update() when results are stripped from the output queue is so simple, and works like a dream. Thanks!

0 replies

casperdcl · 2018-04-19T00:23:04Z

casperdcl
Apr 19, 2018
Maintainer Sponsor

Yes in this case

results = [outq.get() for _ in range(NINPUTS)]

should be

from tqdm import trange

results = [outq.get() for _ in trange(NINPUTS)]

And remove the tqdm logic from elsewhere. Otherwise I think you'll need to annoyingly initialise with the same RLock() for all processes similar to the explicit Pool example (https://github.com/tqdm/tqdm/blob/master/examples/parallel_bars.py ).

0 replies

epruesse · 2019-02-09T01:13:09Z

epruesse
Feb 9, 2019

from https://stackoverflow.com/questions/41920124/multiprocessing-use-tqdm-to-display-a-progress-bar

from multiprocessing import Pool
import tqdm
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   with Pool(2) as p:
      r = list(tqdm.tqdm(p.imap(_foo, range(30)), total=30))

0 replies

boydfd · 2019-08-10T11:10:39Z

boydfd
Aug 10, 2019

After investigating lots of methods, I write a package to handle it, here is the usage:

import threading
from concurrent.futures import ThreadPoolExecutor
import time

from tqdm_multi_thread import TqdmMultiThreadFactory


def demo(factory, position, total):
    with factory.create(position, total) as progress:
        for _ in range(0, total, 5):
            progress.update(5)
            time.sleep(0.001 * (position % 5 + 1))


with ThreadPoolExecutor(max_workers=20) as executor:
    tasks = range(100)
    lock = threading.Lock()
    multi_thread_factory = TqdmMultiThreadFactory()
    for i, url in enumerate(tasks, 1):
        executor.submit(demo, multi_thread_factory, i, 100)

more details: tqdm_multi_thread

0 replies

wassname · 2019-11-27T12:22:29Z

wassname
Nov 27, 2019

Another repo and approach here https://github.com/swansonk14/p_tqdm

0 replies

ehoppmann · 2019-12-18T21:18:05Z

ehoppmann
Dec 18, 2019

Since this is still the top search result, I will add yet another option. While I think that the suggestion from @epruesse is probably the best option all around since it gives in order results but also updates the progress bar in an elegant way while the process pool is working, the one place where this is less ideal is with work that has a highly variable runtime, as the progress bar will stop waiting while the slow work is being done, then suddenly jump very far ahead. In this case, if it's desired to update the progress bar as the work runs, it's possible to update the progress bar manually:

import time
import multiprocessing as mp
from ctypes import c_int32

import tqdm

def f(p):
    time.sleep(min(p, 1))
    with counter_lock:
        counter.value += 1
    return p

counter = mp.Value(c_int32)
counter_lock = mp.Lock()
params = [i for i in range(10)]

with tqdm.tqdm(total=len(params)) as pbar:
    with mp.Pool() as pool:
        future = pool.map_async(f, params)
        while not future.ready():
            if counter.value != 0:
                with counter_lock:
                    increment = counter.value
                    counter.value = 0
                pbar.update(n=increment)
            time.sleep(1)
        result = future.get()

0 replies

mzur · 2020-02-12T14:10:22Z

mzur
Feb 12, 2020

Yet another approach using concurrent.futures (works with ProcessPoolExecutor and ThreadPoolExecutor):

from concurrent.futures import as_completed, ProcessPoolExecutor
from tqdm import tqdm

def process_param(param):
   return param

params = range(100000)
executor = ProcessPoolExecutor()
jobs = [executor.submit(process_param, param) for param in params]

results = []
for job in tqdm(as_completed(jobs), total=len(jobs)):
   results.append(job.result())

0 replies

casperdcl · 2020-02-12T14:19:36Z

casperdcl
Feb 12, 2020
Maintainer Sponsor

note there's a tqdm.contrib.concurrent now in case anyone wants to try it out.

Feel free to PR and improve it and/or add alternative approaches.

0 replies

wohali · 2020-03-28T00:38:49Z

wohali
Mar 28, 2020

Just a quick note that I wasn't able to get tqdm.contrib.concurrent useful for me because it lacks the ability to override the initalizer/initargs (or, rather, hijacks them for its own purposes, necessary for ThreadPoolExecutor in 3.7+).

Because I also need to handle uncaught exceptions in the parent process, I can't actually use tdqm with multiprocessing Pool or concurrent.futures maps either, because of #548 , near as I can tell.

So, for now, @mzur's approach is best. I just wrap the calls to job.result() in a try/except block, and handle the exceptions there.

0 replies

OnlyBelter · 2021-02-04T06:42:55Z

OnlyBelter
Feb 4, 2021

The simplest way would probably be to apply tqdm() around the inputs, rather than the mapping function. For example:

inputs = zip(param1, param2, param3)
  with mp.Pool(8) as pool:
      results = pool.starmap(my_function, tqdm.tqdm(inputs))

from https://stackoverflow.com/a/65854996/2803344

0 replies

orena1 · 2021-05-25T16:19:07Z

orena1
May 25, 2021

Going back to the original question, I still would like to have the ability to control a single tqdm:

import time
import random
from multiprocessing import Pool
from tqdm import tqdm
def func(a):
    for _ in a:
        time.sleep(1)
        pbar.update(1)
    return None
        
lists = [[1,2,3],[4,5,6],[7,8,9]]
pbar = tqdm(total=(len(lists)*len(lists[0])))

with Pool(len(lists)) as p:
    out = p.map(func, lists)

The strange thing is that the code above sort of work:

But when using from tqdm.auto import tqdm or from tqdm.notebook import tqdm it does not work. The strange thing is that I remember that it was working before using notebook.

Any idea why?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to update single progress bar in multiprocessing map() #1121

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 23 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How to update single progress bar in multiprocessing map() #1121

Replies: 23 comments

casperdcl Dec 12, 2017 Maintainer Sponsor

casperdcl Dec 18, 2017 Maintainer Sponsor

casperdcl Dec 19, 2017 Maintainer Sponsor

casperdcl Dec 21, 2017 Maintainer Sponsor

duncanmmacleod Apr 18, 2018 Author

chengs Apr 18, 2018 Collaborator

chengs Apr 18, 2018 Collaborator

duncanmmacleod Apr 18, 2018 Author

duncanmmacleod Apr 18, 2018 Author

casperdcl Apr 19, 2018 Maintainer Sponsor

casperdcl Feb 12, 2020 Maintainer Sponsor

casperdcl
Dec 12, 2017
Maintainer Sponsor

casperdcl
Dec 18, 2017
Maintainer Sponsor

casperdcl
Dec 19, 2017
Maintainer Sponsor

casperdcl
Dec 21, 2017
Maintainer Sponsor

duncanmmacleod
Apr 18, 2018
Author

chengs
Apr 18, 2018
Collaborator

chengs
Apr 18, 2018
Collaborator

duncanmmacleod
Apr 18, 2018
Author

duncanmmacleod
Apr 18, 2018
Author

casperdcl
Apr 19, 2018
Maintainer Sponsor

casperdcl
Feb 12, 2020
Maintainer Sponsor