Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于将cpu换成gpu遇到的问题 #331

Closed
Zhxin99 opened this issue Feb 15, 2023 · 12 comments
Closed

关于将cpu换成gpu遇到的问题 #331

Zhxin99 opened this issue Feb 15, 2023 · 12 comments
Labels
question Further information is requested

Comments

@Zhxin99
Copy link

Zhxin99 commented Feb 15, 2023

您好,我在Linux系统下用cpu跑spikingjelly/activation_based/examples/lif_fc_mnist.py没有问题,但是device改成cuda:0后就会出现以下错误
,请问一下这该怎么解决
Traceback (most recent call last):

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_neg_add_mul_mul_add(float* tspike_1, double vv_reset_2, float* tv_1, float* aten_add_1, float* aten_add) {
{
if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<160ll ? 1 : 0) {
float tspike_1_1 = __ldg(tspike_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = (0.f - tspike_1_1) + 1.f;
float v = __ldg(tv_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = ((0.f - tspike_1_1) + 1.f) * v + tspike_1_1 * (float)(vv_reset_2);
}}
}

@fangwei123456 fangwei123456 added the question Further information is requested label Feb 15, 2023
@fangwei123456
Copy link
Owner

试一下pytorch是否支持gpu

@Zhxin99
Copy link
Author

Zhxin99 commented Feb 15, 2023

支持的

@fangwei123456
Copy link
Owner

提供一下完整的报错信息?

@Zhxin99
Copy link
Author

Zhxin99 commented Feb 15, 2023

提供一下完整的报错信息?

Traceback (most recent call last):

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 303, in
main()

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 198, in main
out_fr += net(encoded_img) # predict value

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 33, in forward
return self.layer(x)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/base.py", line 266, in forward
return self.single_step_forward(*args, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 907, in single_step_forward
return super().single_step_forward(x)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 241, in single_step_forward
self.neuronal_reset(spike)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 205, in neuronal_reset
self.v = self.jit_hard_reset(self.v, spike_d, self.v_reset)

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_neg_add_mul_mul_add(float* tspike_1, double vv_reset_2, float* tv_1, float* aten_add_1, float* aten_add) {
{
if ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)<160ll ? 1 : 0) {
float tspike_1_1 = __ldg(tspike_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = (0.f - tspike_1_1) + 1.f;
float v = __ldg(tv_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_add_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = ((0.f - tspike_1_1) + 1.f) * v + tspike_1_1 * (float)(vv_reset_2);
}}
}

@fangwei123456
Copy link
Owner

运行一下下述代码,看是否报错

import torch

@torch.jit.script
def jit_hard_reset(v: torch.Tensor, spike: torch.Tensor, v_reset: float):
    v = (1. - spike) * v + spike * v_reset
    return v

device = 'cuda:0'

v = torch.rand([8], device=device)

spike = torch.rand_like(v)

v_reset = 0.

z = jit_hard_reset(v, spike, v_reset)

@Zhxin99
Copy link
Author

Zhxin99 commented Feb 15, 2023

不报错

@fangwei123456
Copy link
Owner

那就奇怪了,上面的报错信息是jit_hard_reset的jit编译报错,但运行下面这个同样的代码却没有问题

@Zhxin99
Copy link
Author

Zhxin99 commented Feb 15, 2023

我也很纳闷,那看来只能用cpu跑了

@fangwei123456
Copy link
Owner

是40系新GPU吗?新GPU对pytorch的支持可能有问题

pytorch/pytorch#87595

@Zhxin99
Copy link
Author

Zhxin99 commented Feb 15, 2023

是40系新GPU吗?新GPU对pytorch的支持可能有问题

pytorch/pytorch#87595

是RTX4090

@fangwei123456
Copy link
Owner

估计是新GPU的问题了,试一下最新版的cuda搭配安装nightly版本的pytorch,看能否解决

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

@Zhxin99
Copy link
Author

Zhxin99 commented Feb 15, 2023

估计是新GPU的问题了,试一下最新版的cuda搭配安装nightly版本的pytorch,看能否解决

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

好的,我去试试,谢谢!

@Zhxin99 Zhxin99 closed this as completed Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants