Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: revert NewChunkedBodyWriter #993

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Skyenought
Copy link
Contributor

What type of PR is this?

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:
zh(optional): 取消池化 NewChunkedBodyWriter

cpu 时间出现较大差异

image
image
0.7.1 和 0.6.1 版本的 hertz 进行比较, cpu 时间相差 8 倍

性能出现劣化

=======================================================
 hertz 0.6.1
=======================================================
Running 1m test @ http://localhost:3000/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    23.68ms   17.15ms 307.82ms   83.41%
    Req/Sec     1.86k     1.01k    4.68k    60.13%
  Latency Distribution
     50%   19.53ms
     75%   28.76ms
     90%   42.53ms
     99%   88.00ms
  776720 requests in 1.00m, 705.20MB read
  Socket errors: connect 212, read 0, write 0, timeout 0
Requests/sec:  12923.79
Transfer/sec:     11.73MB

=======================================================
 hertz 0.7.1
=======================================================

Running 1m test @ http://localhost:3000/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    29.03ms   21.50ms 312.42ms   79.90%
    Req/Sec     1.76k   816.53     4.57k    65.35%
  Latency Distribution
     50%   23.47ms
     75%   37.21ms
     90%   55.44ms
     99%  109.00ms
  630186 requests in 1.00m, 572.16MB read
  Socket errors: connect 212, read 0, write 0, timeout 0
Requests/sec:  10488.32
Transfer/sec:      9.52MB

从上面的 wrk 压测结果看出, 0.7.1 版本的 hertz 的各项指标均出现劣化

runtime.SetFinalizer 使用的并不恰当

添加 runtime.SetFinalizer 的原意是用于兜底回收, 但实际上带来了性能劣化

https://lemire.me/blog/2023/05/19/the-absurd-cost-of-finalizers-in-go/

在上面的博客中谈到, 使用 finalizer 会显著增加对象的创建和销毁的时间,大约是不使用 finalizer 的 430 倍。他建议避免使用 finalizer,除非有非常强烈的理由

(Optional) Which issue(s) this PR fixes:

(Optional) The PR that updates user documentation:

@Skyenought Skyenought requested review from a team as code owners November 14, 2023 05:57
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9c3f0b7) 81.81% compared to head (c0b8ca4) 81.78%.

❗ Current head c0b8ca4 differs from pull request most recent head 240dfd5. Consider uploading reports for the commit 240dfd5 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #993      +/-   ##
===========================================
- Coverage    81.81%   81.78%   -0.03%     
===========================================
  Files           98       98              
  Lines         9962     9949      -13     
===========================================
- Hits          8150     8137      -13     
  Misses        1319     1319              
  Partials       493      493              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@welkeyever
Copy link
Member

增加一个revert前后的perf火焰图+benchmark数据看看哈

@Skyenought
Copy link
Contributor Author

Skyenought commented Nov 14, 2023

Flame graph

before:

image

after:

image

wrk 压测

==========================================
 hertz before
==========================================
Running 1m test @ http://localhost:3000/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    31.14ms   42.08ms 788.81ms   97.67%
    Req/Sec     2.40k   375.39     6.64k    84.98%
  Latency Distribution
     50%   25.94ms
     75%   27.11ms
     90%   28.86ms
     99%  211.20ms
  1151229 requests in 1.00m, 1.34GB read
  Socket errors: connect 0, read 724, write 0, timeout 512
Requests/sec:  19217.84
Transfer/sec:     22.93MB


==========================================
 hertz after 
==========================================
Running 1m test @ http://localhost:3000/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    25.49ms    1.89ms 105.93ms   79.05%
    Req/Sec     2.52k   160.92     3.40k    79.71%
  Latency Distribution
     50%   25.57ms
     75%   26.40ms
     90%   27.33ms
     99%   29.93ms
  1203313 requests in 1.00m, 1.37GB read
  Socket errors: connect 0, read 693, write 0, timeout 0
Requests/sec:  20037.59
Transfer/sec:     23.30MB

@li-jin-gou
Copy link
Member

Flame graph

before:

image

after:

image

wrk 压测

==========================================
 hertz before
==========================================
Running 1m test @ http://localhost:3000/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    31.14ms   42.08ms 788.81ms   97.67%
    Req/Sec     2.40k   375.39     6.64k    84.98%
  Latency Distribution
     50%   25.94ms
     75%   27.11ms
     90%   28.86ms
     99%  211.20ms
  1151229 requests in 1.00m, 1.34GB read
  Socket errors: connect 0, read 724, write 0, timeout 512
Requests/sec:  19217.84
Transfer/sec:     22.93MB


==========================================
 hertz after 
==========================================
Running 1m test @ http://localhost:3000/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    25.49ms    1.89ms 105.93ms   79.05%
    Req/Sec     2.52k   160.92     3.40k    79.71%
  Latency Distribution
     50%   25.57ms
     75%   26.40ms
     90%   27.33ms
     99%   29.93ms
  1203313 requests in 1.00m, 1.37GB read
  Socket errors: connect 0, read 693, write 0, timeout 0
Requests/sec:  20037.59
Transfer/sec:     23.30MB

@welkeyever 效果提升明显

benchmark @Skyenought

@Skyenought
Copy link
Contributor Author

=============================================
after
=============================================
goos: linux
goarch: amd64
pkg: github.com/cloudwego/hertz/pkg/protocol/http1/resp
cpu: Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz
BenchmarkSth
BenchmarkSth-8            131325              9104 ns/op
PASS
==============================================
before
==============================================
goos: linux
goarch: amd64
pkg: github.com/cloudwego/hertz/pkg/protocol/http1/resp
cpu: Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz
BenchmarkSth
BenchmarkSth-8            130406             12306 ns/op
PASS

@welkeyever
Copy link
Member

感谢,我晚些时候看看,影响面应该在使用内置NewChunkedBodyWriter劫持response body writer做按需flush的场景。

@welkeyever welkeyever added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 15, 2023
@welkeyever welkeyever self-assigned this Nov 15, 2023
@welkeyever
Copy link
Member

diff看着不太清楚,可以rebase压缩成一个commit吗?

@welkeyever
Copy link
Member

使用wrk本地测了一下sse的quickstart示例,with pool vs no pool分别做了两轮测试,看起来有池化和没有池化对时延的影响并不明显:
image

env:
go version go1.18.6 darwin/arm64
CPU测试水位 80-90%

稍后我再确认一下CPU采样和更仔细的CPU水位对比

@welkeyever
Copy link
Member

过压状态结论相似:
image

@welkeyever
Copy link
Member

@Skyenought 或许可以分享一下你上面的压测程序,我本地再尝试一下

@Skyenought
Copy link
Contributor Author

sse.zip
脚本在 example/server/perf, 如果结果没有出入, 可能是电脑问题

@li-jin-gou
Copy link
Member

li-jin-gou commented Nov 23, 2023

过压状态结论相似: image

看起来池化后是微裂化?

@welkeyever
Copy link
Member

Running 15s test @ http://localhost:8888/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.33ms    2.89ms  80.33ms   84.91%
    Req/Sec     4.09k   729.42    13.61k    73.92%
  Latency Distribution
     50%    7.52ms
     75%    8.36ms
     90%    9.09ms
     99%   17.72ms
  490053 requests in 15.08s, 444.93MB read
  Socket errors: connect 270, read 109, write 0, timeout 0
Requests/sec:  32499.88
Transfer/sec:     29.51MB
 
 ===================================== 
    hertz version: with_pool
 ===================================== 
Running 30s test @ http://localhost:8888/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.20ms    2.81ms 125.41ms   85.01%
    Req/Sec     4.13k     2.67k   12.25k    63.47%
  Latency Distribution
     50%    7.21ms
     75%    8.04ms
     90%    9.00ms
     99%   17.83ms
  988571 requests in 30.09s, 0.88GB read
  Socket errors: connect 270, read 0, write 0, timeout 0
Requests/sec:  32855.03
Transfer/sec:     29.83MB
Running 15s test @ http://localhost:8888/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.27ms    2.86ms  61.19ms   83.61%
    Req/Sec     4.14k   664.75    10.79k    71.17%
  Latency Distribution
     50%    7.44ms
     75%    8.33ms
     90%    9.13ms
     99%   17.45ms
  495576 requests in 15.07s, 449.94MB read
  Socket errors: connect 269, read 110, write 0, timeout 0
Requests/sec:  32888.66
Transfer/sec:     29.86MB
 
 ===================================== 
    hertz version: no_pool
 ===================================== 
Running 30s test @ http://localhost:8888/sse
  8 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.34ms    3.52ms  80.89ms   86.15%
    Req/Sec     4.09k     1.90k   16.99k    73.48%
  Latency Distribution
     50%    7.30ms
     75%    8.28ms
     90%    9.41ms
     99%   21.92ms
  980559 requests in 30.10s, 0.87GB read
  Socket errors: connect 269, read 34, write 0, timeout 0
Requests/sec:  32574.44
Transfer/sec:     29.57MB

看起来确实没啥区别

P.S @Skyenought 你的脚本后面执行结束后可以增加一个 kill pid 的逻辑,一旦忘记手动kill的话其实就是在测之前的 server 🤓
FYI:

#!/bin/bash

rm *.out
rm perf_server

go build -o perf_server main.go
echo "perf_server started"
nohup ./perf_server &
s_pid=$!
echo "pid of perf_server is ${s_pid}"
sleep 1

echo "warmup"
wrk -H "Connection: keep-alive" --latency -c 512 -d 15 --timeout 8 -t 8 http://localhost:8888/sse >> result.txt

sleep 5

echo " " >> result.txt
echo " ===================================== " >> result.txt
echo "    hertz version: $1" >> result.txt
echo " ===================================== " >> result.txt
wrk -H "Connection: keep-alive" --latency -c 512 -d 30 --timeout 8 -t 8 http://localhost:8888/sse >> result.txt

kill $s_pid
echo "finished testing"

@Skyenought
Copy link
Contributor Author

那没有问题的话, 把我这个 pr 关闭吧

@li-jin-gou
Copy link
Member

那没有问题的话, 把我这个 pr 关闭吧

验证中,可以先保留

@Skyenought Skyenought marked this pull request as draft December 1, 2023 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Development

Successfully merging this pull request may close these issues.

None yet

3 participants