关于将cpu换成gpu遇到的问题 #331

Zhxin99 · 2023-02-15T10:48:15Z

您好，我在Linux系统下用cpu跑spikingjelly/activation_based/examples/lif_fc_mnist.py没有问题，但是device改成cuda:0后就会出现以下错误
，请问一下这该怎么解决
Traceback (most recent call last):

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_neg_add_mul_mul_add(float* tspike_1, double vv_reset_2, float* tv_1, float* aten_add_1, float* aten_add) {
{
if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<160ll ? 1 : 0) {
float tspike_1_1 = __ldg(tspike_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = (0.f - tspike_1_1) + 1.f;
float v = __ldg(tv_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = ((0.f - tspike_1_1) + 1.f) * v + tspike_1_1 * (float)(vv_reset_2);
}}
}

fangwei123456 · 2023-02-15T10:49:27Z

试一下pytorch是否支持gpu

Zhxin99 · 2023-02-15T11:04:09Z

支持的

fangwei123456 · 2023-02-15T11:05:19Z

提供一下完整的报错信息？

Zhxin99 · 2023-02-15T11:06:17Z

提供一下完整的报错信息？

Traceback (most recent call last):

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 303, in
main()

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 198, in main
out_fr += net(encoded_img) # predict value

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 33, in forward
return self.layer(x)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/base.py", line 266, in forward
return self.single_step_forward(*args, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 907, in single_step_forward
return super().single_step_forward(x)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 241, in single_step_forward
self.neuronal_reset(spike)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 205, in neuronal_reset
self.v = self.jit_hard_reset(self.v, spike_d, self.v_reset)

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_neg_add_mul_mul_add(float* tspike_1, double vv_reset_2, float* tv_1, float* aten_add_1, float* aten_add) {
{
if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<160ll ? 1 : 0) {
float tspike_1_1 = __ldg(tspike_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = (0.f - tspike_1_1) + 1.f;
float v = __ldg(tv_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = ((0.f - tspike_1_1) + 1.f) * v + tspike_1_1 * (float)(vv_reset_2);
}}
}

fangwei123456 · 2023-02-15T11:12:59Z

运行一下下述代码，看是否报错

import torch

@torch.jit.script
def jit_hard_reset(v: torch.Tensor, spike: torch.Tensor, v_reset: float):
    v = (1. - spike) * v + spike * v_reset
    return v

device = 'cuda:0'

v = torch.rand([8], device=device)

spike = torch.rand_like(v)

v_reset = 0.

z = jit_hard_reset(v, spike, v_reset)

Zhxin99 · 2023-02-15T11:14:05Z

不报错

fangwei123456 · 2023-02-15T11:16:30Z

那就奇怪了，上面的报错信息是jit_hard_reset的jit编译报错，但运行下面这个同样的代码却没有问题

Zhxin99 · 2023-02-15T11:18:06Z

我也很纳闷，那看来只能用cpu跑了

fangwei123456 · 2023-02-15T11:19:00Z

是40系新GPU吗？新GPU对pytorch的支持可能有问题

pytorch/pytorch#87595

Zhxin99 · 2023-02-15T11:21:51Z

是40系新GPU吗？新GPU对pytorch的支持可能有问题

pytorch/pytorch#87595

是RTX4090

fangwei123456 · 2023-02-15T11:23:33Z

估计是新GPU的问题了，试一下最新版的cuda搭配安装nightly版本的pytorch，看能否解决

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

Zhxin99 · 2023-02-15T11:24:26Z

估计是新GPU的问题了，试一下最新版的cuda搭配安装nightly版本的pytorch，看能否解决

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

好的，我去试试，谢谢！

fangwei123456 added the question Further information is requested label Feb 15, 2023

Zhxin99 closed this as completed Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于将cpu换成gpu遇到的问题 #331

关于将cpu换成gpu遇到的问题 #331

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

关于将cpu换成gpu遇到的问题 #331

关于将cpu换成gpu遇到的问题 #331

Comments

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023

fangwei123456 commented Feb 15, 2023

Zhxin99 commented Feb 15, 2023