Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swifter "progress_bar" Not Working #176

Open
lightonphiri opened this issue Mar 16, 2022 · 14 comments
Open

Swifter "progress_bar" Not Working #176

lightonphiri opened this issue Mar 16, 2022 · 14 comments

Comments

@lightonphiri
Copy link

I just started experimenting with Swifter a few minutes ago and have been struggling to get the progress bar to show.

I have the code snippet below, that was appropriated using the example code provided.

Why is the prgress_bar(enable=True) option not working? Is there something wrong with my code?

var_unza_dspace_dataframe["subjectMistakes"] = var_unza_dspace_dataframe["subject"].str.split("=").swifter.allow_dask_on_strings(enable=True).progress_bar(
    enable=True, desc='Subjects Mistakes'
).apply(fxn_subject_spellchecker)
@jmcarpenter2
Copy link
Owner

jmcarpenter2 commented Mar 16, 2022

Hey @lightonphiri,

Quick question: How fast does this apply run in?

Context: Swifter first tries to vectorize your function. So if it is completing almost instantenously with no progress bar it is because it can't provide a progress bar for a vectorized operation.

Otherwise, let me look into how the progress bar interacts with a pd.Series.. perhaps the .str call is manipulating the type of series and causing the progress bar not to show

What is your environment?
OS, python version, swifter version, pandas version, etc
perhaps use the example from #162 so we can compare for similarities

@jmcarpenter2
Copy link
Owner

jmcarpenter2 commented Mar 16, 2022

I did a quick test and got a progress bar, so im thinking the .str hypothesis above is wrong

my_string_series = pd.Series(["A STR SERIES"])
my_string_series.str.split(" ").swifter.apply(lambda x: "-".join(x))
Pandas Apply: 100%|██████████████████████████████| 1/1 [00:00<00:00, 750.86it/s]

@lightonphiri
Copy link
Author

lightonphiri commented Mar 17, 2022

Thank you for the quick response @jmcarpenter2

> Quick question: How fast does this apply run in?
It take a very long time, which is why I ended up discovering Swifter. I have a total of about 6k records I am processing: processing a single record take ~18 secs. Running this on a single CPU would theoretically take ~27 hours to finish

> What is your environment?

  • OS: Ubuntu 20.04.4 LTS,
  • Python version: Python 3.8.10,
  • Swifter version: 1.1.2,
  • Pandas version: 1.4.0

Incidentally, as I was frantically trying to find alternatives, I came across Pandarellel and was able to get it to visualise what was going on (see image below): you see I wanted to be certain multiple cores are being used when I run processes.

img-unza22-csc5741-efficiency_techniques-swifter_issue

@jiahe224
Copy link

jiahe224 commented Apr 2, 2022

my_string_series = pd.Series(["A STR SERIES"])
my_string_series.str.split(" ").swifter.apply(lambda x: "-".join(x))

I used the same test and no progress bar was displayed。
image
OS:win10 21H1 64bit
CPU: AMD Ryzen 5 PRO 4650U with Radeon Graphics
swifter: 1.1.2
pandas: 1.4.1
python: 3.9.7
jupyterlab: 3.3.2
notebook: 6.4.10

@jiahe224
Copy link

jiahe224 commented Apr 2, 2022

Thank you for the quick response @jmcarpenter2

> Quick question: How fast does this apply run in? It take a very long time, which is why I ended up discovering Swifter. I have a total of about 6k records I am processing: processing a single record take ~18 secs. Running this on a single CPU would theoretically take ~27 hours to finish

> What is your environment?

  • OS: Ubuntu 20.04.4 LTS,
  • Python version: Python 3.8.10,
  • Swifter version: 1.1.2,
  • Pandas version: 1.4.0

Incidentally, as I was frantically trying to find alternatives, I came across Pandarellel and was able to get it to visualise what was going on (see image below): you see I wanted to be certain multiple cores are being used when I run processes.

img-unza22-csc5741-efficiency_techniques-swifter_issue

Pandarellel is good, but it doesn't support Windows

@jmcarpenter2
Copy link
Owner

jmcarpenter2 commented Apr 5, 2022

Hey @jiahe224, isn't this the progress bar? Or did it just never fill out?
161374074-bccf3d97-ec05-43f8-a5ad-1f39e97da768

@jiahe224
Copy link

jiahe224 commented Apr 6, 2022

The task has finished running, but the progress is shown as 0%, the example you gave shows 100% progress. @jmcarpenter2

@jmcarpenter2
Copy link
Owner

Very interesting. That's very good insight.. I'll look into it

@jiahe224
Copy link

jiahe224 commented Apr 7, 2022

Looking forward to your fix, thank you so much for doing this and making my job so much easier!

@PeikaiLi
Copy link

same issue~

@davera-017
Copy link

same issue

@jn21
Copy link

jn21 commented Feb 3, 2023

same issue as above - progress bar shows up but never gets filled out even though the apply operation runs successfully.

@wq624915051
Copy link

same issue

@jmcarpenter2
Copy link
Owner

jmcarpenter2 commented Mar 24, 2023

Trying to narrow this one down

Just a quick poll.. can you give me a 👍 if you are experiencing this on a Windows machine? And a 👎 if on a Linux/MacOS?

CC: @PeikaiLi @davera-017 @jn21 @wq624915051

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants