Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally display progress bar in print_status #602

Closed
joaander opened this issue Jan 11, 2022 · 14 comments
Closed

Optionally display progress bar in print_status #602

joaander opened this issue Jan 11, 2022 · 14 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@joaander
Copy link
Member

Feature description

I would like to hide the progress bar output generated by FlowProject.print_status.

Proposed solution

Add the argument progress to the signature of FlowProject.print_status with a default value of False. Users should opt-in to additional output when desired.

Additional context

The proposed solution follows that in FlowProject.run which has the default argument prorgress=False.

The progress bar is a significant amount of the total output for small workspaces and provides no benefit to users for small workspaces. Additionally, using print_status in Jupyter notebooks requires special configuration or else print_status fails with an exception. New users running tutorials for the first time may be frustrated troubleshooting issues like this.

@bdice
Copy link
Member

bdice commented Jan 11, 2022

@joaander Thanks for the issue. Disabling the progress bar might lead to several seconds or even minutes of no output in the terminal when dealing with status checks over large workspaces. I believe we found the lack of output confused users and caused them to kill the process in early prototypes (or not realize that they had written expensive status checks). Would it be okay to leave the option enabled by default and allow users to opt out with --no-progress? The progress bar is disabled by default when running operations because it may interleave a progress bar between any output generated by the operations, especially because running operations is often done in parallel. The status fetching doesn’t have that constraint.

edit: I was mostly speaking about the CLI above but you referred to the Python API - please feel free to clarify if you see or expect any differences in the behavior between these two.

Can you also give more context about the Jupyter exception you’re seeing and what special configuration is needed to resolve it?

@joaander
Copy link
Member Author

I'm fine with either option as a default, though I would find it confusing if run and print_status had different defaults.

When I first tried print_status in a Jupyter Lab instance, I got the below exception. Resolving it required navigating to the indicated webpage to learn that I needed to install the pypi package ipywidgets which doesn't appear to be listed as a dependency (required or optional) of signac-flow, signac, or even tqdm. This notebook will be a tutorial for HOOMD, so I would expect that users running the tutorial themselves will have the same issue.

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/var/folders/r7/rt2r4c5n4n59m913k6rls7gw00vxwm/T/ipykernel_22801/4177291619.py in <module>
      1 project = flow.FlowProject()
----> 2 project.print_status(overview=False, detailed=True, parameters=['volume_fraction'])

~/venv/lib/python3.9/site-packages/flow/project.py in print_status(self, jobs, overview, overview_max_lines, detailed, parameters, param_max_width, expand, all_ops, only_incomplete, dump_json, unroll, compact, pretty, file, err, ignore_errors, template, profile, eligible_jobs_max_lines, output_format)
   2766 
   2767         else:
-> 2768             status_results, job_labels, individual_jobs = self._fetch_status(
   2769                 aggregates=aggregates,
   2770                 err=err,

~/venv/lib/python3.9/site-packages/flow/project.py in _fetch_status(self, aggregates, err, ignore_errors, status_parallelization)
   2516                 )
   2517             )
-> 2518             status_results = parallel_executor(
   2519                 compute_status,
   2520                 aggregate_groups,

~/venv/lib/python3.9/site-packages/flow/util/misc.py in parallel_executor(func, iterable, **kwargs)
    375                 # Chunk size only applies to thread/process parallel executors
    376                 del kwargs["chunksize"]
--> 377             return list(tmap(func, iterable, **kwargs))
    378 
    379     return parallel_executor

~/venv/lib/python3.9/site-packages/tqdm/contrib/__init__.py in tmap(function, *sequences, **tqdm_kwargs)
     95     tqdm_class  : [default: tqdm.auto.tqdm].
     96     """
---> 97     for i in tzip(*sequences, **tqdm_kwargs):
     98         yield function(*i)

~/venv/lib/python3.9/site-packages/tqdm/contrib/__init__.py in tzip(iter1, *iter2plus, **tqdm_kwargs)
     82     kwargs = tqdm_kwargs.copy()
     83     tqdm_class = kwargs.pop("tqdm_class", tqdm_auto)
---> 84     for i in zip(tqdm_class(iter1, **kwargs), *iter2plus):
     85         yield i
     86 

~/venv/lib/python3.9/site-packages/tqdm/notebook.py in __init__(self, *args, **kwargs)
    240         unit_scale = 1 if self.unit_scale is True else self.unit_scale or 1
    241         total = self.total * unit_scale if self.total else self.total
--> 242         self.container = self.status_printer(self.fp, total, self.desc, self.ncols)
    243         self.container.pbar = proxy(self)
    244         self.displayed = False

~/venv/lib/python3.9/site-packages/tqdm/notebook.py in status_printer(_, total, desc, ncols)
    113         # Prepare IPython progress bar
    114         if IProgress is None:  # #187 #451 #558 #872
--> 115             raise ImportError(
    116                 "IProgress not found. Please update jupyter and ipywidgets."
    117                 " See https://ipywidgets.readthedocs.io/en/stable"

ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

@bdice
Copy link
Member

bdice commented Jan 11, 2022

The notebook issue is partially caused by #371. tqdm has an automatic mode that detects if it is in a notebook, and shows an HTML progress bar instead of a console one (the console format doesn't render very well in the notebook, I have sometimes seen the \r fail to clear the current line and instead output lots of lines with progress bars). This issue has been reported to tqdm as tqdm/tqdm#1082 and there is a PR tqdm/tqdm#1218 to fix it by falling back to the console progress bar.

@kidrahahjo
Copy link
Member

Given the state of this issue, I believe we can try to fix this behaviour by defaulting to not showing a progress bar when people use the Python API. What do you suggest?

@vyasr
Copy link
Contributor

vyasr commented Mar 3, 2022

Summarizing the discussion so far to make sure that we're all on the same page:

  • We are all in favor of adding a progress (or no_progress) option to print_status.
  • We're not sure whether progress should be enabled or disabled by default.
  • We would like print_status and run to have the same default.
  • Hardik's last comment raises the question, do Python and CLI users have the same expectations? I could see an argument for default progress bars on the command line but default no progress bars in Python since the CLI is much more of a black box.

@bdice
Copy link
Member

bdice commented Mar 4, 2022

I agree with that overview. On the last point, I’d say progress on by default in CLI and progress off by default in the Python API would match my expectations as a user for each interface.

CLI users also have more direct control of stdout and stderr than Python users, if we differentiate progress bars (stderr) and program output (stdout). I believe that’s how the output works currently but I’m not 100% sure.

@kidrahahjo
Copy link
Member

By default, we don't show progress in run but we do while printing status. So, do we need to do a deprecation cycle for just changing the default behaviour?

@bdice
Copy link
Member

bdice commented Mar 12, 2022

I don't think we need to do any deprecation cycles to change the default behavior of a progress bar. The addition of a deprecation message would be a more disruptive change than just adding/removing the progress bar in my opinion.

@vyasr vyasr added the good first issue Good for newcomers label Mar 14, 2022
@vyasr
Copy link
Contributor

vyasr commented Mar 14, 2022

Definitely no need for a deprecation cycle.

@vyasr
Copy link
Contributor

vyasr commented Mar 14, 2022

In light of the above discussions, I'm proposing the following as the action items to close this issue:

  • Add a progress=False option to print_status.
  • Add a CLI parameter --no-progress to the command line interface for status.
  • Ensure that run and print_status have the same defaults for both Python and command line invocations.

@joaander
Copy link
Member Author

joaander commented Sep 1, 2022

What line(s) in project.py generate the progress bar print_status?

I see

signac-flow/flow/project.py

Lines 2258 to 2265 in b4d870e

if tqdm_kwargs is not None:
return tqdm(
aggregates,
total=len(aggregates),
**tqdm_kwargs,
)
else:
return aggregates

but I don't see where this is called with tqdm_kwargs != None.

@vyasr
Copy link
Contributor

vyasr commented Sep 19, 2022

@joaander I think what you're looking for is this line: https://github.com/glotzerlab/signac-flow/blob/master/flow/project.py#L2671. The parallel_executor is selected via this function that returns tqdm's own progressbar-based wrappers around map.

@joaander
Copy link
Member Author

Thanks. It appears from this code that making progress bars optional is not a trivial change.

@vyasr
Copy link
Contributor

vyasr commented Sep 19, 2022

I think it would require the following:

  1. Add a progress=True argument to _get_parallel_executor.
  2. When progress=False, that function should return a concurrent.futures Executor (selecting the correct one based on the parallelization) instead of one of tqdm's wrappers around those executors.
  3. Add the same progress arguments to print_status and _fetch_status (called by print_status) and forward that argument along.
  4. Possibly changing the print_status call in _main_status as well depending on whether the default verbosity from the CLI is desired to be different from the default for the Python API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants