Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caution: Old peerswap CLN plugin processes don't always die #186

Open
wtogami opened this issue May 26, 2023 · 4 comments
Open

Caution: Old peerswap CLN plugin processes don't always die #186

wtogami opened this issue May 26, 2023 · 4 comments
Assignees
Milestone

Comments

@wtogami
Copy link
Contributor

wtogami commented May 26, 2023

$ ps aux |grep peerswap
cln       729129  0.0  0.0 1173572  868 ?        Sl   Mar17   0:03 /home/cln/src/peerswap/peerswap
cln       729164  0.0  0.0 1173572 1568 ?        Sl   Mar17   0:03 /home/cln/src/peerswap/peerswap
cln      1261763  0.0  0.0 1173888  308 ?        Sl   Apr10   0:02 /home/cln/src/peerswap/peerswap
cln      1261796  0.0  0.0 1247108  824 ?        Sl   Apr10   0:02 /home/cln/src/peerswap/peerswap
cln      2233765  0.4  0.1 1401056 27684 ?       Sl   May23  20:01 /home/cln/src/peerswap/peerswap
cln      2723044  0.0  0.0 1247112  268 ?        Sl    2022   0:07 /home/cln/src/peerswap/peerswap-plugin

The above example is a server that hadn't rebooted in 1 year. CLN was upgraded a few times since then. lightningd the parent process launches peerswap (or older name peerswap-plugin). The child process is supposed to die when the parent dies but in many cases it didn't.

Maybe this isn't a big deal except the older peerswap processes might still have open file handles to the database and unexpected behavior could ensue? Or maybe not since their stdio with their long-dead parent process is gone maybe they're deadlocked. Since those older peerswap binaries are gone I'm unable to gdb backtrace to see where they are stuck. In any case zombie plugins are unable to log via stdio.

Is there a golang signal handler that can act in the event the parent process dies? Would it be safer to use such a thing to ensure it actually does die?

@nepet
Copy link
Contributor

nepet commented May 30, 2023

It seems that a kill signal on the main process might cause some problems.
Let me check what I can do but I have an idea that should help:
When stdin closes (our way to communicate with core-lightning) we kill/quit peerswap. This way we will get that cln died no matter if the kill signal gets through.

@grubles
Copy link
Collaborator

grubles commented Jul 19, 2023

Not sure how helpful this is but I took the opportunity to run strace on a zombie plugin instance.

$ strace -p 37303
strace: Process 37303 attached
futex(0x15a3b08, FUTEX_WAIT_PRIVATE, 0, NULL

@wtogami wtogami added this to the v1.0 milestone Jul 19, 2023
@grubles
Copy link
Collaborator

grubles commented Jul 25, 2023

@nepet found out a way to reproduce the zombie processes by starting lightningd with --daemon and without a --log path. Also, I found if you start lightningd without bitcoind running, you can create zombie peerswap processes when lightningd crashes.

For some reason, gdb can't attach quickly enough to the peerswap process when this happens if using something like lightningd && gdb /path/to/peerswap/binary -p $(pgrep peerswap) but I found a short script that works:

#!/bin/sh
progstr=$1
progpid=`pgrep -o $progstr`
while [ "$progpid" = "" ]; do
  progpid=`pgrep -o $progstr`
done
gdb -ex continue -p $progpid

Running this script before starting lightningd, I was able to attach to a zombie peerswap process and get a backtrace, although I don't know how useful it is.

runtime.futex () at /usr/lib/go-1.19/src/runtime/sys_linux_arm64.s:666
666             SVC
(gdb) backtrace
#0  runtime.futex () at /usr/lib/go-1.19/src/runtime/sys_linux_arm64.s:666
#1  0x00000000004352ac in runtime.futexsleep (addr=<optimized out>, val=128, ns=0) at /usr/lib/go-1.19/src/runtime/os_linux.go:69
#2  0x000000000040ebf0 in runtime.notesleep (n=0x15a3ac8 <runtime.m0+328>) at /usr/lib/go-1.19/src/runtime/lock_futex.go:160
#3  0x000000000043fa44 in runtime.mPark () at /usr/lib/go-1.19/src/runtime/proc.go:1457
#4  runtime.stopm () at /usr/lib/go-1.19/src/runtime/proc.go:2247
#5  0x00000000004412bc in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>)
    at /usr/lib/go-1.19/src/runtime/proc.go:2874
#6  0x00000000004422d8 in runtime.schedule () at /usr/lib/go-1.19/src/runtime/proc.go:3214
#7  0x000000000044329c in runtime.goexit0 (gp=0x4000227a00) at /usr/lib/go-1.19/src/runtime/proc.go:3540
#8  0x0000000000466534 in runtime.mcall () at /usr/lib/go-1.19/src/runtime/asm_arm64.s:192
#9  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@wtogami
Copy link
Contributor Author

wtogami commented Aug 11, 2023

$ ps aux |grep peer
btc       148296  0.0  0.0 1173016    0 pts/1    Sl   Aug03   0:00 /home/btc/src/peerswap/peerswap
btc      3776066  0.0  0.0 1246744 1148 ?        Sl   Jul16   0:01 /home/btc/src/peerswap/peerswap
btc      3776123  0.0  0.0 1099536 1256 ?        Sl   Jul16   0:01 /home/btc/src/peerswap/peerswap
btc      4092778  0.0  0.0 1246744 1212 ?        Sl   Jul27   0:00 /home/btc/src/peerswap/peerswap
btc      4092822  0.0  0.0 1172500    4 ?        Sl   Jul27   0:00 /home/btc/src/peerswap/peerswap
btc      4092868  0.0  0.0 1173012    0 pts/1    Sl   Jul27   0:00 /home/btc/src/peerswap/peerswap

I don't know how this is happening.

@nepet nepet self-assigned this Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants