Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket server process deadlock(multithreading)? #3062

Open
hyc242828 opened this issue Feb 6, 2024 · 2 comments
Open

websocket server process deadlock(multithreading)? #3062

hyc242828 opened this issue Feb 6, 2024 · 2 comments

Comments

@hyc242828
Copy link

we create libwebsockets(version 4.3.3) server with multithreading, 4 threads. (OS: redhat7.9)

test case:
1.run "kill -STOP [server pid]" to suspend server process
2.then after a few minutes, run "kill -CONT [server pid]" to resume server process
3.after resumed, server need to work normally

But one time, an exception occurs:

  1. clients's connection (that not from lan, not from lo) lost, and cant not connect to server
  2. clients 's connection (that from lan, or from lo) status is "established" , and send message successfully, but no response. run tcpdump, packets that to server captured, but no packets that to client
  3. run pstack command to dump server process stack, looks like a deadlock

only related to "kill -STOP" command? it doesn't normally happen ?

[pstack result]
Thread 6 (Thread 0x7fbf9303d700 (LWP 31316)):
#0 0x00007fbf9447054d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fbf9446be9b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007fbf9446bd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fbf960243df in lws_mutex_refcount_lock (mr=0x1545840, reason=0x7fbf9604cbe8 "periodic checks") at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core/libwebsockets.c:1314
#4 0x00007fbf960199ff in lws_sul_plat_unix (sul=0x15447c8) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/plat/unix/unix-init.c:64
#5 0x00007fbf96033785 in __lws_sul_service_ripe (own=0x1544738, own_len=2, usnow=1707202986223493) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/sorted-usec-list.c:161
#6 0x00007fbf9601a235 in _lws_plat_service_tsi (context=0x1544400, timeout_ms=, tsi=0) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:125
#7 0x00007fbf960327de in lws_service_tsi (context=0x1544400, timeout_ms=100, tsi=0) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/service.c:870
#8 0x000000000040c3da in CWSThread::CB_Thread (arg=0x1541f08) at src/queue/WSThread.cpp:83
#9 0x00007fbf94469ea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fbf9477cb0d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fbf9283c700 (LWP 31317)):
#0 0x00007fbf9447054d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fbf9446be9b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007fbf9446bd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fbf960243df in lws_mutex_refcount_lock (mr=0x15446a8, reason=0x7fbf96050180 <func.36882> "__lws_adopt_descriptor_vhost1") at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core/libwebsockets.c:1314
#4 0x00007fbf960364f2 in __lws_adopt_descriptor_vhost1 (vh=, type=7, vh_prot_name=0x0, parent=0x0, opaque=, fi_wsi_name=) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/adopt.c:162
#5 0x00007fbf96036739 in lws_adopt_descriptor_vhost_via_info (info=0x7fbf9283bac0) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/adopt.c:526
#6 0x00007fbf960367e5 in lws_adopt_descriptor_vhost (vh=, type=, fd=..., vh_prot_name=, parent=) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/adopt.c:494
#7 0x00007fbf96049a3d in rops_handle_POLLIN_listen (pt=0x1544890, wsi=0x1583320, pollfd=0x154cc18) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/roles/listen/ops-listen.c:148
#8 0x00007fbf9603298e in lws_service_fd_tsi (context=0x1544400, pollfd=0x154cc18, tsi=1) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/service.c:766
#9 0x00007fbf9601a05e in _lws_plat_service_forced_tsi (context=0x1544400, tsi=1) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:51
#10 0x00007fbf9601a3f8 in _lws_plat_service_tsi (context=0x1544400, timeout_ms=, tsi=1) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:216
#11 0x00007fbf960327de in lws_service_tsi (context=0x1544400, timeout_ms=100, tsi=1) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/service.c:870
#12 0x000000000040c3da in CWSThread::CB_Thread (arg=0x1541f20) at src/queue/WSThread.cpp:83
#13 0x00007fbf94469ea5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fbf9477cb0d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fbf9203b700 (LWP 31318)):
#0 0x00007fbf9447054d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fbf9446be9b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007fbf9446bd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fbf960243df in lws_mutex_refcount_lock (mr=0x1545840, reason=0x7fbf960501e0 <func.37029> "lws_adopt_descriptor_vhost_via_info") at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core/libwebsockets.c:1314
#4 0x00007fbf9603671e in lws_adopt_descriptor_vhost_via_info (info=0x7fbf9203aac0) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/adopt.c:524
#5 0x00007fbf960367e5 in lws_adopt_descriptor_vhost (vh=, type=, fd=..., vh_prot_name=, parent=) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/adopt.c:494
#6 0x00007fbf96049a3d in rops_handle_POLLIN_listen (pt=0x1544aa0, wsi=0x1584a00, pollfd=0x154ec18) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/roles/listen/ops-listen.c:148
#7 0x00007fbf9603298e in lws_service_fd_tsi (context=0x1544400, pollfd=0x154ec18, tsi=2) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/service.c:766
#8 0x00007fbf9601a05e in _lws_plat_service_forced_tsi (context=0x1544400, tsi=2) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:51
#9 0x00007fbf9601a3f8 in _lws_plat_service_tsi (context=0x1544400, timeout_ms=, tsi=2) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:216
#10 0x00007fbf960327de in lws_service_tsi (context=0x1544400, timeout_ms=100, tsi=2) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/service.c:870
#11 0x000000000040c3da in CWSThread::CB_Thread (arg=0x1541f38) at src/queue/WSThread.cpp:83
#12 0x00007fbf94469ea5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fbf9477cb0d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fbf9183a700 (LWP 31319)):
#0 0x00007fbf9447054d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fbf9446be9b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007fbf9446bd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fbf960243df in lws_mutex_refcount_lock (mr=0x1545840, reason=0x7fbf9604f2e0 <func.37321> "lws_close_free_wsi") at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core/libwebsockets.c:1314
#4 0x00007fbf9602df8d in lws_close_free_wsi (wsi=0x7fbf7c005700, reason=LWS_CLOSE_STATUS_NOSTATUS, caller=0x7fbf9604fb13 "close_and_handled") at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/close.c:1013
#5 0x00007fbf96032a2a in lws_service_fd_tsi (context=0x1544400, pollfd=0x1550c20, tsi=) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/service.c:778
#6 0x00007fbf9601a05e in _lws_plat_service_forced_tsi (context=0x1544400, tsi=3) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:51
#7 0x00007fbf9601a3f8 in _lws_plat_service_tsi (context=0x1544400, timeout_ms=, tsi=3) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/plat/unix/unix-service.c:216
#8 0x00007fbf960327de in lws_service_tsi (context=0x1544400, timeout_ms=100, tsi=3) at /usr/local/3rdParty/libwebsockets-4.3.3/lib/core-net/service.c:870
#9 0x000000000040c3da in CWSThread::CB_Thread (arg=0x1541f50) at src/queue/WSThread.cpp:83
#10 0x00007fbf94469ea5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007fbf9477cb0d in clone () from /lib64/libc.so.6

@hyc242828
Copy link
Author

lws_service_tsi timeout 100ms, and worker thread print log every 60 seconds. at that time, worker threads didn't print that log

@lws-team
Copy link
Member

lws-team commented Feb 6, 2024

I would try this with main branch lws. If still broken, valgrind might help pinpoint the deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants