Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on amdgpu read timer expiration #4

Open
mibli opened this issue Jun 9, 2021 · 0 comments
Open

Crash on amdgpu read timer expiration #4

mibli opened this issue Jun 9, 2021 · 0 comments

Comments

@mibli
Copy link

mibli commented Jun 9, 2021

Jun 09 17:19:35 - amdgpu-fan[851]: Traceback (most recent call last):
Jun 09 17:19:35 - amdgpu-fan[851]:   File "/usr/bin/amdgpu-fan", line 33, in <module>
Jun 09 17:19:35 - kernel: amdgpu: [powerplay] failed send message: TransferTableSmu2Dram (18)         param: 0x00000006 response 0xffffffc2
Jun 09 17:19:35 - kernel: amdgpu: [powerplay] Failed to export SMU metrics table!
Jun 09 17:19:35 - amdgpu-fan[851]:     sys.exit(load_entry_point('amdgpu-fan==0.1.0', 'console_scripts', 'amdgpu-fan')())
Jun 09 17:19:35 - amdgpu-fan[851]:   File "/usr/lib/python3.9/site-packages/amdgpu_fan/controller.py", line 95, in main
Jun 09 17:19:35 - amdgpu-fan[851]:     FanController(config).main()
Jun 09 17:19:35 - amdgpu-fan[851]:   File "/usr/lib/python3.9/site-packages/amdgpu_fan/controller.py", line 36, in main
Jun 09 17:19:35 - amdgpu-fan[851]:     if current_speed is not None and abs(current_speed - card.fan_speed) > 10:
Jun 09 17:19:35 - amdgpu-fan[851]:   File "/usr/lib/python3.9/site-packages/amdgpu_fan/lib/amdgpu.py", line 50, in fan_speed
Jun 09 17:19:35 - amdgpu-fan[851]:     return int(int(self.read_endpoint('pwm1')) * 100 / self.fan_max)
Jun 09 17:19:35 - amdgpu-fan[851]:   File "/usr/lib/python3.9/site-packages/amdgpu_fan/lib/amdgpu.py", line 37, in read_endpoint
Jun 09 17:19:35 - amdgpu-fan[851]:     return e.read()
Jun 09 17:19:35 - amdgpu-fan[851]: OSError: [Errno 62] Timer expired
Jun 09 17:19:35 - systemd[1]: amdgpu-fan.service: Main process exited, code=exited, status=1/FAILURE
Jun 09 17:19:35 - audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=amdgpu-fan comm="systemd" exe="/usr/lib/systemd/systemd" h>
Jun 09 17:19:35 - kernel: audit: type=1131 audit(1623251975.286:69): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=amdgpu-fan comm="systemd" exe=>
Jun 09 17:19:35 - systemd[1]: amdgpu-fan.service: Failed with result 'exit-code'.
Jun 09 17:19:35 - systemd[1]: amdgpu-fan.service: Scheduled restart job, restart counter is at 3

Sometimes my graphics card can hang up, the fan service will crash then and this is not very safe. If possible, the error should be caught and preemptive measures should be taken, such as spin up of the fans. Yeah it's not ideal, but it's safer than idling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant