Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update some logs on extension host process service #2173

Merged
merged 4 commits into from
Jan 9, 2023

Conversation

erha19
Copy link
Member

@erha19 erha19 commented Jan 6, 2023

Types

  • 🧹 Chores

Background or solution

close #2166.

优化插件进程内部分日志展示,便于后续问题排查

Changelog

update some logs on extension host process service

@erha19 erha19 requested review from hacke2 and bytemain January 6, 2023 07:28
@github-actions github-actions bot added the 🐞 bug Something isn't working label Jan 6, 2023
@erha19
Copy link
Member Author

erha19 commented Jan 6, 2023

这个配置名没想好,大概列一下,感觉可以投个票:

  1. killExtensionHostProcessWhenDisconnected
  2. disposeExtensionHostProcessWhenDisconnected

或者短一点的

  1. killExtHostProcessWhenDisconnected
  2. disposeExtHostProcessWhenDisconnected

@hacke2
Copy link
Member

hacke2 commented Jan 6, 2023

默认关闭?通信恢复后如果插件进程没有被杀死,应该能直接重连

@erha19
Copy link
Member Author

erha19 commented Jan 6, 2023

默认关闭?通信恢复后如果插件进程没有被杀死,应该能直接重连

@hacke2 默认关闭的话常规用户使用就会出现大量僵尸进程,集成侧自行配置吧,正常情况应该是链接数量与插件进程相对应的才对。

@bytemain
Copy link
Member

bytemain commented Jan 6, 2023

这个直接断开有点粗暴,用户刷新下页面啥都没了。

要不然:过了五分钟还没有人连接,就清理进程。

@hacke2
Copy link
Member

hacke2 commented Jan 6, 2023

默认关闭?通信恢复后如果插件进程没有被杀死,应该能直接重连

@hacke2 默认关闭的话常规用户使用就会出现大量僵尸进程,集成侧自行配置吧,正常情况应该是链接数量与插件进程相对应的才对。

你们出现过僵尸进程的情况吗?我们好像没有出现过

@hacke2
Copy link
Member

hacke2 commented Jan 6, 2023

这个直接断开有点粗暴,用户刷新下页面啥都没了。

要不然:过了五分钟还没有人连接,就清理进程。

现在就是这个逻辑,这个时间集成方可以自动定义

@erha19
Copy link
Member Author

erha19 commented Jan 6, 2023

你们出现过僵尸进程的情况吗?我们好像没有出现过

Docker 环境下很容易复现这个问题,参考项目 https://github.com/opensumi/ide-startup

@codecov
Copy link

codecov bot commented Jan 6, 2023

Codecov Report

Base: 57.79% // Head: 57.77% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (1fa70dc) compared to base (7593498).
Patch coverage: 71.63% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2173      +/-   ##
==========================================
- Coverage   57.79%   57.77%   -0.03%     
==========================================
  Files        1313     1315       +2     
  Lines       82995    83012      +17     
  Branches    17254    17266      +12     
==========================================
- Hits        47970    47963       -7     
- Misses      31835    31854      +19     
- Partials     3190     3195       +5     
Flag Coverage Δ
jsdom 52.73% <61.53%> (-0.01%) ⬇️
node 16.50% <39.42%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
packages/core-node/src/connection.ts 66.15% <0.00%> (+0.93%) ⬆️
...es/extension/src/browser/extension-node.service.ts 24.17% <0.00%> (ø)
packages/extension/src/node/index.ts 0.00% <0.00%> (ø)
...terminal-next/src/node/terminal.profile.service.ts 60.36% <0.00%> (ø)
...ages/connection/src/node/common-channel-handler.ts 59.55% <20.00%> (ø)
packages/extension/src/node/extension.service.ts 46.05% <35.29%> (-1.35%) ⬇️
...ackages/terminal-next/src/node/terminal.service.ts 71.52% <42.85%> (ø)
packages/extension/src/hosted/ext.host.ts 78.59% <54.54%> (-0.33%) ⬇️
...kages/extension/src/node/extension.host.manager.ts 87.20% <68.75%> (-6.77%) ⬇️
packages/extension/src/common/extension.ts 83.80% <83.80%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@erha19
Copy link
Member Author

erha19 commented Jan 6, 2023

试了一下,这个修改并不能修复僵尸进程问题,就算没加这个逻辑,目前的插件进程也是会在关闭页面后被推出

@hacke2
Copy link
Member

hacke2 commented Jan 6, 2023

试了一下,这个修改并不能修复僵尸进程问题,就算没加这个逻辑,目前的插件进程也是会在关闭页面后被推出

是的,杀死插件有两个时机:

  1. 页面关闭时会向后台发送请求杀死插件进程
  2. 通信失去连接时会 5分钟(集成方可以设置)后杀死插件进程

当通信恢复时,可以不刷新页面,重新启动一个插件进程,这个插件进程在启动前会销毁前一个插件进程的副作用

@erha19
Copy link
Member Author

erha19 commented Jan 6, 2023

定位到问题还是出在 tree-kill 这个库上,在 Docker 上不能正常将子进程杀掉,相关 Issue:

pkrumins/node-tree-kill#38

@bytemain
Copy link
Member

bytemain commented Jan 7, 2023

看起来在 apline 下确实有问题,评论区给了 workaround.

写一个 @opensumi/tree-kill 吧,tree-kill 这也是集成方要用的东西。

@erha19
Copy link
Member Author

erha19 commented Jan 9, 2023

看起来在 apline 下确实有问题,评论区给了 workaround.

这里提供的代码并不能解决问题,在稳定的 Docker 版本下,通过 PID 查找子进程的逻辑是正常被处理的,但问题发生在 https://github.com/pkrumins/node-tree-kill/blob/deee138a8cbc918463d8af5ce8c2bec33c3fd164/index.js#L53 这里,不能正常将进程 Kill 掉

image

image

image

@bytemain
Copy link
Member

bytemain commented Jan 9, 2023

https://askubuntu.com/questions/201303/what-is-a-defunct-process-and-why-doesnt-it-get-killed

From your output we see a "defunct", which means the process has either completed its task or has been corrupted or killed, but its child processes are still running or these parent process is monitoring its child process. To kill this kind of process, kill -9 PID doesn't work. You can try to kill them with this command but it will show this again and again.

@bytemain
Copy link
Member

bytemain commented Jan 9, 2023

CleanShot 2023-01-09 at 12 17 24@2x

kill -9 是杀不掉这些子进程的,因为它们还在被父进程 monitoring。
这个父进程就是我们的 node ./dist-node/server/index.js

@bytemain
Copy link
Member

bytemain commented Jan 9, 2023

root       491     1  6 04:33 ?        00:00:01 /usr/local/bin/node /release/hosted/ext.process.js --kt-process-sock-option={"path":"/tmp/sumi-ipc/sumi-ipc-ext_processc5aaADw_59dnt1kiqmHCD.soc
root       503   491  6 04:33 ?        00:00:01 /usr/local/bin/node --max-old-space-size=3072 /root/.sumi/extensions/vscode.typescript-language-features/deps/typescript/lib/tsserver.js --serve
root       504   491  7 04:33 ?        00:00:01 /usr/local/bin/node --max-old-space-size=3072 /root/.sumi/extensions/vscode.typescript-language-features/deps/typescript/lib/tsserver.js --useIn
root       521   504  1 04:33 ?        00:00:00 /usr/local/bin/node /root/.sumi/extensions/vscode.typescript-language-features/deps/typescript/lib/typingsInstaller.js --globalTypingsCacheLocat

就是这些插件进程:

root       503     1  3 04:33 ?        00:00:01 [node] <defunct>
root       504     1  3 04:33 ?        00:00:01 [node] <defunct>
root       521     1  1 04:33 ?        00:00:00 [node] <defunct>

这些插件进程是被 /usr/local/bin/node /release/hosted/ext.process.js 创建的,当 ext host 被 kill 的时候,ext host 的所有子进程被移到了主进程(也就是 1)上。

在 docker 上,这个 node 是 init 进程,当 ext host 没有 kill 完它的孩子后,它的孩子成为了孤儿就交给了 init 进程,init 如果是 systemd 这些,就会处理掉这个孤儿。但是我们的 node 没有实现这一步。

@bytemain
Copy link
Member

bytemain commented Jan 9, 2023

The only way you could remove the zombie/defunct process, would be to kill the parent.

@erha19 erha19 changed the title fix: auto kill extension host process when disconnected chore: update some logs on extension host process service Jan 9, 2023
@erha19
Copy link
Member Author

erha19 commented Jan 9, 2023

如这篇文章提到的 https://maximorlov.com/process-signals-inside-docker-containers/ , 直接通过 CMD [ "node", "./dist-node/server/index.js" ] 方式启动 Node 进程就会产生僵尸进程问题,在 Docker 1.13 版本后内置了 tini 能力,只需要在运行时追加 --init 参数即可。

通过携带该参数的运行方式即可避免这个问题,如 docker run --init --rm -d -p 8080:8000/tcp ghcr.io/opensumi/opensumi-web:latest

该 PR 仅修改部分 Log 信息,便于排查后续问题,无其他增量修改。

@erha19 erha19 removed the 🐞 bug Something isn't working label Jan 9, 2023
@erha19 erha19 merged commit b077d94 into main Jan 9, 2023
@erha19 erha19 deleted the fix/auto-kill-extension-host-process branch January 9, 2023 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opensumi docker容器产生很多僵尸进程[BUG]
3 participants