Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot shutdown laptop with linux6.6-6.6.31_1 #50417

Open
Sqvid opened this issue May 18, 2024 · 23 comments
Open

Cannot shutdown laptop with linux6.6-6.6.31_1 #50417

Sqvid opened this issue May 18, 2024 · 23 comments
Labels
bug Something isn't working needs-testing Testing a PR or reproducing an issue needed

Comments

@Sqvid
Copy link
Contributor

Sqvid commented May 18, 2024

Is this a new report?

Yes

System Info

Void 6.6.31_1 x86_64 AuthenticAMD uptodate rrFFFF

Package(s) Affected

linux6.6-6.6.31_1

Does a report exist for this bug with the project's home (upstream) and/or another distro?

No response

Expected behaviour

After runit winds down services the laptop should switch off.

Actual behaviour

Upgrading to kernel 6.6.31_1 prevents the computer from shutting down. Runit seems to stop all services correctly and the screen goes black as expected; however, the keyboard backlight and power light stay on and the fans still run.

##System info:
CPU: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
GPU: Nvidia GeForce RTX 4060 (disabled and powered off)
WM: Sway

Downgrading to 6.6.30_1 or below fixes this issue and the laptop shuts down immediately as expected.

Not sure how to extract useful debug info but would appreciate any tips.

Steps to reproduce

  1. Use effected kernel (though it may be hardware dependent).
  2. sudo poweroff now.
  3. Read runit messages of services being stopped.
  4. Screen goes black.
  5. Laptop stays on.
  6. Cry.
@Sqvid Sqvid added bug Something isn't working needs-testing Testing a PR or reproducing an issue needed labels May 18, 2024
@blacklightpy
Copy link

I have kernel 6.6.29_1 and my system did not poweroff when I tried shutting it down with with my AGS widgets. I didn't debug whether it was the widgets or the system though, and I'm not in a mood to shutdown now.

@1is7ac3
Copy link
Contributor

1is7ac3 commented May 19, 2024

I had this problem on my two Nvidia Optimus laptops: (Intel + Nvidia) and (Amd + Nvidia) while I was in the process of resuming the use of Void Linux, so it only gave me the error while I was using Nouveau. When I finished configuring the system with the Nvidia drivers, it no longer appeared.

@funk443
Copy link
Contributor

funk443 commented May 19, 2024

I encountered this problem after I updated the kernel this afternoon. Exactly the same problem like @Sqvid.

My system spec:

  • uname -a output: Linux void-btw 6.6.31_1 #1 SMP PREEMPT_DYNAMIC Sat May 18 01:07:55 UTC 2024 x86_64 GNU/Linux
  • CPU: AMD Ryzen 7 Pro 7840u w/ Radeon 780M Graphics
  • GPU: Intgrated Radeon 780M Graphics

Since I usually clear caches after an update, so I cannot downgrade the kernel. But I tried installing linux6.9 and it works just fine.

@tomalexander
Copy link

I'm on arch linux but I have the same behavior. I went in my cache and reverted to test:

linux-lts-6.6.30-2-x86_64.pkg.tar.zst : shuts down fine
linux-lts-6.6.31-1-x86_64.pkg.tar.zst : does not shut down

@artluix
Copy link

artluix commented May 23, 2024

Facing same issue on Laptop with 8845hs during installation (no UI is installed)

@thenbe
Copy link

thenbe commented May 24, 2024

I found this thread through a search engine (I'm on nixOS not void linux), but I have the same issue on nixOS on a framework amd laptop (AMD Ryzen 7 7840U). Shutdown appears to work at first, where the systemd logs zoom right past like always. Then the screen goes black, but the power "stays on" (power led stays lit) indefinitely. I need to manually hold the power button for several seconds to force the shutdown to go through. Rolling back the kernel from 6.6.31 to 6.6.30 fixes the issue.

@superm1
Copy link

superm1 commented May 24, 2024

Anyone affected by this can you check two things:

  1. Is 6.9.1 affected?
  2. If 6.9.1 isn't affected it's probably an incomplete backport. It's best to bisect 6.6.30 to 6.6.31 to find which commit caused it.

@superm1
Copy link

superm1 commented May 24, 2024

CC @knurd

@knurd
Copy link

knurd commented May 24, 2024

Warning, the following is a totally wild guess. But there was one report about a shutdown problem caused by a commit that went into 6.6.31 as well; wonder if that might be related. https://lore.kernel.org/all/CAE4VaREzY%2Ba2PvQJYJbfh8DwB4OP7kucZG-e28H22xyWob1w_A@mail.gmail.com/

@lectrode
Copy link

lectrode commented May 25, 2024

Bisected between 6.6.30 and 6.6.31 to find the problem commit:

$ git bisect start --no-checkout
status: waiting for both good and bad commits
$ git bisect good 5697d159afef8c475f13a0b7b85f09bd4578106c
status: waiting for bad commit, 1 good commit known
$ git bisect bad e3d332aaf898ed755b29c8cdf59be2cfba1cac4b
Bisecting: 154 revisions left to test after this (roughly 7 steps)
[6466a0f6d235c8a18c602cb587160d7e49876db9] uio_hv_generic: Don't free decrypted memory
$ git bisect good
Bisecting: 77 revisions left to test after this (roughly 6 steps)
[7a54e5052bde582fd0e7677334fe7a5be92e242c] usb: gadget: uvc: use correct buffer size when parsing configfs lists
$ git bisect bad
Bisecting: 38 revisions left to test after this (roughly 5 steps)
[2ee2fc6786bc5ff3c24798624ea3806c9662c26f] selftests/net: convert test_bridge_neigh_suppress.sh to run it in unique namespace
$ git bisect good
Bisecting: 19 revisions left to test after this (roughly 4 steps)
[7019a64165186fd3fb5ba928a6347558dc560093] net: dsa: mv88e6xxx: add phylink_get_caps for the mv88e6320/21 family
$ git bisect good
Bisecting: 9 revisions left to test after this (roughly 3 steps)
[0c7ed3ed35eec9138b88d42217b5a6b9a62bda4d] drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2
$ git bisect bad
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[95ca7c90eaf5ea8a8460536535101e3e81160e2a] gpiolib: cdev: Fix use after free in lineinfo_changed_notify
$ git bisect bad
Bisecting: 2 revisions left to test after this (roughly 1 step)
[fa2d2e2d8eae03acf49229793f2a6fddede92c4d] drm/meson: dw-hdmi: add bandgap setting for g12
$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[6c83a8f236ece78d5c2e60ae3dcfd1a64509410e] dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users
$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[1b9e8de483bcc230f6e922bdfa9d1c186c27dd3b] drm/connector: Add \n to message about demoting connector force-probes
$ git bisect good
6c83a8f236ece78d5c2e60ae3dcfd1a64509410e is the first bad commit
commit 6c83a8f236ece78d5c2e60ae3dcfd1a64509410e
Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Thu May 2 13:32:17 2024 -0500

    dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users
    
    [ Upstream commit cd94d1b182d2986378550c9087571991bfee01d4 ]
    
    Limit the workaround introduced by commit 31729e8c21ec ("drm/amd/pm: fixes
    a random hang in S4 for SMU v13.0.4/11") to only run in the s4 path.
    
    Cc: Tim Huang <Tim.Huang@amd.com>
    Fixes: 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11")
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

@superm1
Copy link

superm1 commented May 25, 2024

Ok... Does it also fail on 6.9?

@tomalexander
Copy link

It's best to bisect 6.6.30 to 6.6.31 to find which commit caused it.

I have bisected it to this commit in https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux.git :

commit 6c83a8f236ece78d5c2e60ae3dcfd1a64509410e (HEAD)
Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Thu May 2 14:32:17 2024

    dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users

    [ Upstream commit cd94d1b182d2986378550c9087571991bfee01d4 ]

    Limit the workaround introduced by commit 31729e8c21ec ("drm/amd/pm: fixes
    a random hang in S4 for SMU v13.0.4/11") to only run in the s4 path.

    Cc: Tim Huang <Tim.Huang@amd.com>
    Fixes: 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11")
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

@lectrode
Copy link

lectrode commented May 25, 2024

The issue is not present on 6.9.1. I would agree it looks like an incomplete/failed backport.
@tomalexander good to have confirmation

@superm1
Copy link

superm1 commented May 25, 2024

Interesting. It's supposed to be quite self contained.

Can you guys please confirm your firmware versions from debugfs?

/sys/kernel/debug/dri/0/amdgpu_firmware_info

@tomalexander
Copy link

Its going to take me a while to confirm 6.9.1 because the release version of zfs only supports up to 6.8. I'll have to try the upstream zfs git and if that doesn't work, swap out my SSD and make a new non-zfs install.

My firmware on Arch Linux's build of 6.6.31-1-lts:

$ doas cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 35, firmware version: 0x00000027
PFP feature version: 35, firmware version: 0x00000030
CE feature version: 0, firmware version: 0x00000000
RLC feature version: 1, firmware version: 0x00000080
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
RLCP feature version: 1, firmware version: 0x0000000f
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 35, firmware version: 0x00000027
IMU feature version: 0, firmware version: 0x0b012d00
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 553648339, firmware version: 0x210000d3
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x1700003e
TA DTM feature version: 0x00000000, firmware version: 0x12000016
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 0, firmware version: 0x004c5200 (76.82.0)
SDMA0 feature version: 60, firmware version: 0x00000011
VCN feature version: 0, firmware version: 0x0711300d
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x08003a00
TOC feature version: 0, firmware version: 0x0000000b
MES_KIQ feature version: 6, firmware version: 0x00000073
MES feature version: 1, firmware version: 0x00000059
VBIOS version: 113-PHXGENERIC-001

@tomalexander
Copy link

Nevermind, lucked out, upstream zfs git works for 6.9.1. I have confirmed that 6.9.1 shuts down properly.

@lectrode
Copy link

lectrode commented May 25, 2024

Firmware as reported on Manjaro running 6.6.30):
(128 also exists under dri, but reports identical info)

# mount -t debugfs none /sys/kernel/debug
# cat /sys/kernel/debug/dri/1/amdgpu_firmware_info 
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 35, firmware version: 0x00000027
PFP feature version: 35, firmware version: 0x00000030
CE feature version: 0, firmware version: 0x00000000
RLC feature version: 1, firmware version: 0x00000080
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
RLCP feature version: 1, firmware version: 0x0000000f
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 35, firmware version: 0x00000027
IMU feature version: 0, firmware version: 0x0b012d00
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 553648339, firmware version: 0x210000d3
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x1700003e
TA DTM feature version: 0x00000000, firmware version: 0x12000016
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 0, firmware version: 0x004c5200 (76.82.0)
SDMA0 feature version: 60, firmware version: 0x00000011
VCN feature version: 0, firmware version: 0x0711300d
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x08003a00
TOC feature version: 0, firmware version: 0x0000000b
MES_KIQ feature version: 6, firmware version: 0x00000073
MES feature version: 1, firmware version: 0x00000059
VPE feature version: 0, firmware version: 0x00000000
VBIOS version: 113-PHXGENERIC-001

@superm1
Copy link

superm1 commented May 26, 2024

Can you guys try this? Ideally please try it both on 6.9.y and 6.6.y:

0001-Add-hopefully-a-solution-for-shutdown-regression.PATCH

@lectrode
Copy link

lectrode commented May 26, 2024

That patch works on 6.6.31

Will update once I've tested on 6.9.1 and 6.1.91 (it is also an issue on 6.1.91 - have not tested older LTS kernels).

@lectrode
Copy link

lectrode commented May 26, 2024

Update: 6.9.1 continues to work with that patch, and 6.1.91 is fixed with it.

Also tested 5.10.217 and 5.15.159, but neither of those have the issue in question.

@superm1
Copy link

superm1 commented May 26, 2024

Thanks! I've posted it to the mailing list for review:

https://lore.kernel.org/amd-gfx/20240526125908.2742-1-mario.limonciello@amd.com/T/#u

If anyone else wants to add A Reported-by or Tested-by tag, link or anything else please respond to that thread.

@MIvanchev
Copy link
Contributor

I just wanted to report that this kernel also messed up the standby for me. I rarely shutdown the PC but stand by it regularly and now it works only 2 times and then reports "Another instance of zzz is already running".

@superm1
Copy link

superm1 commented May 29, 2024

I just wanted to report that this kernel also messed up the standby for me. I rarely shutdown the PC but stand by it regularly and now it works only 2 times and then reports "Another instance of zzz is already running".

Does it bisect to same result? If so please try the fix linked above. If it's bisecting to different commit it should be different issue.

FireBurn pushed a commit to FireBurn/linux that referenced this issue May 30, 2024
commit cd94d1b ("dm/amd/pm: Fix problems with reboot/shutdown for
some SMU 13.0.4/13.0.11 users") attempted to fix shutdown issues
that were reported since commit 31729e8 ("drm/amd/pm: fixes a
random hang in S4 for SMU v13.0.4/11") but caused issues for some
people.

Adjust the workaround flow to properly only apply in the S4 case:
-> For shutdown go through SMU_MSG_PrepareMp1ForUnload
-> For S4 go through SMU_MSG_GfxDeviceDriverReset and
   SMU_MSG_PrepareMp1ForUnload

Reported-and-tested-by: lectrode <electrodexsnet@gmail.com>
Closes: void-linux/void-packages#50417
Cc: stable@vger.kernel.org
Fixes: cd94d1b ("dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users")
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-testing Testing a PR or reproducing an issue needed
Projects
None yet
Development

No branches or pull requests