Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIL will crash (segmentation error) when save large-scale image in multithreads #235

Closed
lizhihao6 opened this issue Jan 24, 2022 · 2 comments
Assignees
Labels
kind/bug something isn't working priority/P0 highest priority
Milestone

Comments

@lizhihao6
Copy link

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

  1. What command or script did you run?
    Run myself code, happended when use MMGenVisualizationHook to save large-scale image(4032x3024) to storage server.

Could be reproducted by:
modified MMGenVisualizationHook

        img_cat = torch.randn([2, 3, 3024, 4032])
        save_image(
            img_cat,
            osp.join(self._out_dir, filename),
            nrow=self.nrow,
            padding=self.padding)
python tools/train.py config/myconfig
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
    Yes.
  2. What dataset did you use?
    Myself dataset.

Environment

  1. Please run python mmgen/utils/collect_env.py to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

TorchVision: 0.10.1
OpenCV: 4.5.3
MMCV: 1.4.3
MMGen: 0.5.0+4de90c2
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 11.1

Error traceback
If applicable, paste the error trackback here.

2022-01-24 14:42:51,625 - mmgen - INFO - workflow: [('train', 1)], max: 80000 iters
2022-01-24 14:42:51,626 - mmgen - INFO - Checkpoints will be saved to ./work_dirs/experiments/ganISP_cityscapes_rgb2iphone_raw/ckpt/ganISP_cityscapes_rgb2iphone_raw by HardDiskBackend.
Fatal Python error: Segmentation fault

Thread 0x00007f2d2f7fe700 (most recent call first):

Thread 0x00007f2d2ffff700 (most recent call first):

Thread 0x00007f2d7c833700 (most recent call first):

Thread 0x00007f2d7d034700 (most recent call first):

Thread 0x00007f2dcbcdd700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 296 in wait
  File "/root/.anaconda3/envs/raw/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 870 in run
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f2dcc4de700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 296 in wait
  File "/root/.anaconda3/envs/raw/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 870 in run
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f2dcccdf700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 296 in wait
  File "/root/.anaconda3/envs/raw/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 870 in run
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f2dcd4e0700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 296 in wait
  File "/root/.anaconda3/envs/raw/lib/python3.7/multiprocessing/queues.py", line 224 in _feed
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 870 in run
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/root/.anaconda3/envs/raw/lib/python3.7/threading.py", line 890 in _bootstrap

Current thread 0x00007f2f5b591700 (most recent call first):
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/PIL/ImageFile.py", line 509 in _save
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/PIL/PngImagePlugin.py", line 1348 in _save
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/PIL/Image.py", line 2212 in save
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/torchvision/utils.py", line 136 in save_image
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28 in decorate_context
  File "/lzh/Project/mmgeneration/mmgen/core/hooks/visualization.py", line 105 in after_train_iter
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 94 in wrapper
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 309 in call_hook
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 67 in train
  File "/root/.anaconda3/envs/raw/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 134 in run
  File "/lzh/Project/mmgeneration/mmgen/apis/train.py", line 199 in train_model
  File "tools/train.py", line 164 in main
  File "tools/train.py", line 168 in <module>

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

May related to python-pillow/Pillow#4225
Fixed by using cv2:

def save_image(tensor, path, nrow, padding):
     # replace torchvision.utils import save_image because of segmentation error
     img = tensor.detach().cpu().numpy()
     b, c, h, w = img.shape
     assert c == 3
     total_h = h * b + (b-1) * padding
     _img = np.zeros([total_h, w, 3])
     start_h = 0
     for i in range(b):
         _img[start_h:start_h+h] = img[i].transpose([1, 2, 0])
         start_h += (h + padding)
     _img = np.clip(_img, 0, 1) * 255
     _img = np.round(_img).astype(np.uint8)[..., ::-1]
     cv2.imwrite(path, _img)
@LeoXing1996
Copy link
Collaborator

Hey @lizhihao6, thanks for your interest in our project.
Can you provide your Pillow version? I will try to reproduce and debug this later.

@lizhihao6
Copy link
Author

lizhihao6 commented Jan 24, 2022 via email

@zengyh1900 zengyh1900 added kind/bug something isn't working awaiting response priority/P0 highest priority labels Oct 12, 2022
@zengyh1900 zengyh1900 added this to the 0.8.0 milestone Oct 12, 2022
@open-mmlab open-mmlab locked and limited conversation to collaborators Oct 13, 2022
@plyfager plyfager converted this issue into discussion #469 Oct 13, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
kind/bug something isn't working priority/P0 highest priority
Projects
None yet
Development

No branches or pull requests

3 participants