PIL will crash (segmentation error) when save large-scale image in multithreads #235

lizhihao6 · 2022-01-24T14:45:42Z

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

What command or script did you run?
Run myself code, happended when use MMGenVisualizationHook to save large-scale image(4032x3024) to storage server.

Could be reproducted by:
modified MMGenVisualizationHook

        img_cat = torch.randn([2, 3, 3024, 4032])
        save_image(
            img_cat,
            osp.join(self._out_dir, filename),
            nrow=self.nrow,
            padding=self.padding)

python tools/train.py config/myconfig

Did you make any modifications on the code or config? Did you understand what you have modified?
Yes.
What dataset did you use?
Myself dataset.

Environment

Please run python mmgen/utils/collect_env.py to collect necessary environment information and paste it here.
You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

TorchVision: 0.10.1
OpenCV: 4.5.3
MMCV: 1.4.3
MMGen: 0.5.0+4de90c2
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 11.1

Error traceback
If applicable, paste the error trackback here.

2022-01-24 14:42:51,625 - mmgen - INFO - workflow: [('train', 1)], max: 80000 iters
2022-01-24 14:42:51,626 - mmgen - INFO - Checkpoints will be saved to ./work_dirs/experiments/ganISP_cityscapes_rgb2iphone_raw/ckpt/ganISP_cityscapes_rgb2iphone_raw by HardDiskBackend.
Fatal Python error: Segmentation fault

Thread 0x00007f2d2f7fe700 (most recent call first):

Thread 0x00007f2d2ffff700 (most recent call first):

Thread 0x00007f2d7c833700 (most recent call first):

Thread 0x00007f2d7d034700 (most recent call first):

Thread 0x00007f2dcbcdd700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 296 in wait
  File "/root/.anaconda3/envs/raw/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 870 in run
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f2dcc4de700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 296 in wait
  File "/root/.anaconda3/envs/raw/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 870 in run
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f2dcccdf700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 296 in wait
  File "/root/.anaconda3/envs/raw/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 870 in run
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f2dcd4e0700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 296 in wait
  File "/root/.anaconda3/envs/raw/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 870 in run
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 890 in _bootstrap

Current thread 0x00007f2f5b591700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/PIL/ImageFile.py", line 509 in _save
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/PIL/PngImagePlugin.py", line 1348 in _save
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/PIL/Image.py", line 2212 in save
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/torchvision/utils.py", line 136 in save_image
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28 in decorate_context
  File "/lzh/Project/mmgeneration/mmgen/core/hooks/visualization.py", line 105 in after_train_iter
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 94 in wrapper
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 309 in call_hook
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 67 in train
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 134 in run
  File "/lzh/Project/mmgeneration/mmgen/apis/train.py", line 199 in train_model
  File "tools/train.py", line 164 in main
  File "tools/train.py", line 168 in <module>

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

May related to python-pillow/Pillow#4225
Fixed by using cv2:

def save_image(tensor, path, nrow, padding):
     # replace torchvision.utils import save_image because of segmentation error
     img = tensor.detach().cpu().numpy()
     b, c, h, w = img.shape
     assert c == 3
     total_h = h * b + (b-1) * padding
     _img = np.zeros([total_h, w, 3])
     start_h = 0
     for i in range(b):
         _img[start_h:start_h+h] = img[i].transpose([1, 2, 0])
         start_h += (h + padding)
     _img = np.clip(_img, 0, 1) * 255
     _img = np.round(_img).astype(np.uint8)[..., ::-1]
     cv2.imwrite(path, _img)

The text was updated successfully, but these errors were encountered:

LeoXing1996 · 2022-01-24T14:51:02Z

Hey @lizhihao6, thanks for your interest in our project.
Can you provide your Pillow version? I will try to reproduce and debug this later.

lizhihao6 · 2022-01-24T18:33:40Z

thx for your attention，Pillow==9.0.0 获取 Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ 发件人: LeoXing1996 ***@***.***> 发送时间: Monday, January 24, 2022 10:51:14 PM 收件人: open-mmlab/mmgeneration ***@***.***> 抄送: lizhihao6 ***@***.***>; Mention ***@***.***> 主题: Re: [open-mmlab/mmgeneration] PIL will crash (segmentation error) when save large-scale image in multithreads (Issue #235) Hey @lizhihao6<https://github.com/lizhihao6>, thanks for your interest in our project. Can you provide your Pillow version? I will try to reproduce and debug this later. ― Reply to this email directly, view it on GitHub<#235 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFJEKRA3GF2CECPTCBTWIDLUXVROFANCNFSM5MVSJW7A>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

zengyh1900 assigned LeoXing1996 Oct 12, 2022

zengyh1900 added kind/bug something isn't working awaiting response priority/P0 highest priority labels Oct 12, 2022

zengyh1900 added this to the 0.8.0 milestone Oct 12, 2022

open-mmlab locked and limited conversation to collaborators Oct 13, 2022

plyfager converted this issue into discussion #469 Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

PIL will crash (segmentation error) when save large-scale image in multithreads #235

PIL will crash (segmentation error) when save large-scale image in multithreads #235

lizhihao6 commented Jan 24, 2022

LeoXing1996 commented Jan 24, 2022

lizhihao6 commented Jan 24, 2022 via email

This issue was moved to a discussion.

This issue was moved to a discussion.

PIL will crash (segmentation error) when save large-scale image in multithreads #235

PIL will crash (segmentation error) when save large-scale image in multithreads #235

Comments

lizhihao6 commented Jan 24, 2022

LeoXing1996 commented Jan 24, 2022

lizhihao6 commented Jan 24, 2022 via email

This issue was moved to a discussion.