BUG: Segmentation fault when calling pytorch function after np.exp (numpy 1.21.2) #21714

kokamido · 2022-06-10T10:55:05Z

Describe the issue:

Hi! There is an issue connected to numpy and pytorch. I can't reproduce it with numpy 1.21.3, but in 1.21.2 it exists. If I run provided code example with SIZE=15 then both print functions (they are exactly the same) will print True. If I run it with SIZE=20, the first print will display True but the second will crash because of segmentation fault. If I run it with SIZE=1000 it will display True and False. If I remove np.exp call the code will print True True for any positive int SIZE.
This behavior can be reproduced in the following docker container:

FROM ubuntu:focal-20220531

RUN apt update

RUN apt install -y wget

# Miniconda installation
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py39_4.11.0-Linux-x86_64.sh -O ~/miniconda.sh && \
    /bin/bash ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh

RUN /opt/conda/bin/conda install  -c pytorch -c nvidia pytorch==1.10.1

RUN /opt/conda/bin/conda install  -c pytorch -c nvidia numpy==1.21.2

ENTRYPOINT /bin/bash

Reproduce the code example:

import torch
import numpy as np

SIZE=19

print(torch.all(torch.isfinite(torch.fft.fft(torch.eye(SIZE), dim=1))))
np.exp([2])
print(torch.all(torch.isfinite(torch.fft.fft(torch.eye(SIZE), dim=1))))

Error message:

No response

NumPy/Python version information:

numpy==1.21.2
pytorch==1.10.1

The text was updated successfully, but these errors were encountered:

seberg · 2022-06-10T13:54:00Z

I would suspect it to be related to gh-20405, that would cause pretty random stuff. That issue is fixed in 1.22.0 and later.

I am not quite sure how old that issue was. The complexity is that there was the additional complexity of a compiler bug being involved. Will have to dig deeper, but it may be that the issue only "appeared" with a new GCC release, so at the time of the release all may have been fine, and now it is not because the nvidia channel uses a newer compiler...

seberg · 2022-06-10T14:04:54Z

From the discussion in gh-20356, I suspect that the bug would only occur with gcc 10. I wonder what the best thing is, also a bit related to gh-21713. EDIT: Not sure which gcc versions it appears or when/whether it got fixed. Older ones probably have not optimized as aggressively and did not show it.

Maybe we should backport some of these at least as source-only, since channels like the nvidia one can then still pick them up or at least find them.

EDIT: Nvm, the nvidia channel of course only has nvidia packages, this would be from the default anaconda channel.

seberg · 2022-06-10T15:45:14Z

@kokamido I am not quite sure how to best proceed. Maybe you can confirm that this is on a machine with a SkylakeX CPU? It might be nice to confirm that the specific patch works, but that will require compiling NumPy on an affected machine (I don't have a skylakex machine here).

If this is important to you to get a 1.21.x release that is guaranteed fixed, maybe we need to open an Anaconda issue?

kokamido · 2022-06-10T16:22:16Z

In my tests my repro works with both Intel Xeon Gold 5320T (which is Ice Lake) and Intel Core i7-11800H (which is Tiger Lake). And it doesn't reproduce with 1.21.3 and 1.21.6 from Anaconda (I haven't tested 1.21.4 and 1.21.5).
It's not necessary to me to get a fixed version of 1.21.x release because I can use numpy>=1.22. I opened this issue because the problem looked bizarre and I couldn't google anything directly related to it.
If problem was completely fixed in release 1.22, then this issue can be closed. Otherwise I can test something in my environment if you think it will be useful.

kokamido added the 00 - Bug label Jun 10, 2022

seberg added the 31 - Third-party binaries Install/import issues other than Anaconda-specific label Jun 10, 2022

seberg added the 57 - Close? Issues which may be closable unless discussion continued label Jun 10, 2022

kokamido mentioned this issue Jun 13, 2022

BUG: Segmentation fault when calling pytorch function after np.exp (numpy 1.21.2) pytorch/pytorch#79410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Segmentation fault when calling pytorch function after np.exp (numpy 1.21.2) #21714

BUG: Segmentation fault when calling pytorch function after np.exp (numpy 1.21.2) #21714

kokamido commented Jun 10, 2022 •

edited

seberg commented Jun 10, 2022 •

edited

seberg commented Jun 10, 2022 •

edited

seberg commented Jun 10, 2022

kokamido commented Jun 10, 2022

BUG: Segmentation fault when calling pytorch function after np.exp (numpy 1.21.2) #21714

BUG: Segmentation fault when calling pytorch function after np.exp (numpy 1.21.2) #21714

Comments

kokamido commented Jun 10, 2022 • edited

Describe the issue:

Reproduce the code example:

Error message:

NumPy/Python version information:

seberg commented Jun 10, 2022 • edited

seberg commented Jun 10, 2022 • edited

seberg commented Jun 10, 2022

kokamido commented Jun 10, 2022

kokamido commented Jun 10, 2022 •

edited

seberg commented Jun 10, 2022 •

edited

seberg commented Jun 10, 2022 •

edited