Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geth memory leak on json-rpc response gzip compression #1801

Closed
VladStarr opened this issue Aug 5, 2023 · 7 comments
Closed

Geth memory leak on json-rpc response gzip compression #1801

VladStarr opened this issue Aug 5, 2023 · 7 comments
Assignees

Comments

@VladStarr
Copy link

VladStarr commented Aug 5, 2023

System information

Geth version: Geth Version: 1.2.9 Architecture: amd64 Go Version: go1.19.11 Operating System: linux
OS & Version: Linux public-bsc-0 5.15.0-70-generic #77-Ubuntu SMP Tue Mar 21 14:02:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Commit hash : (if develop)
CLI args:
geth --config=/config/config.toml --datadir=/data --syncmode=full --gcmode=full --maxpeers=50 --cache=8192 --snapshot=true --pipecommit=false --persistdiff=false --diffblock=86400 --port=30311 --rpc.allow-unprotected-txs --txlookuplimit=0 --cache.preimages --ws --tries-verify-mode=none --metrics --pprof --pprof.addr=0.0.0.0 --pprof.port=6060 --pruneancient=true

Expected behaviour

Geth should not have memory leak when responding with Gzip-compressed data.
In our case every JSON-RPC request to node has Accept-Encoding: gzip, deflate.

Actual behaviour

Memory is leaking when all requests to Geth are sent with Accept-Encoding: gzip, deflate header and Geth is killed because of OOM.

Steps to reproduce the behaviour

  1. Modify request headers so that Accept-Encoding: gzip, deflate will be added to each JSON-RPC request.
  2. Forward real or simulated traffic to Geth which include all known methods, i.e. eth_call or eth_getLogs. Our node serves at least 50 RPS.

Backtrace

Accept-Encoding: gzip, deflate is present in each request from reverse proxy

Showing nodes accounting for 30.28GB, 93.30% of 32.45GB total
Dropped 778 nodes (cum <= 0.16GB)
Showing top 10 nodes out of 125
      flat  flat%   sum%        cum   cum%
   19.58GB 60.33% 60.33%    23.80GB 73.32%  compress/flate.NewWriter
    4.09GB 12.59% 72.93%     4.09GB 12.59%  compress/flate.(*compressor).initDeflate (inline)
    1.66GB  5.12% 78.05%     1.66GB  5.12%  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get
    1.31GB  4.05% 82.10%     1.31GB  4.05%  github.com/VictoriaMetrics/fastcache.(*bucket).Set
    1.18GB  3.63% 85.73%     1.51GB  4.66%  github.com/ethereum/go-ethereum/core/vm.makeLog.func1
    0.80GB  2.48% 88.21%     0.80GB  2.48%  github.com/syndtr/goleveldb/leveldb/memdb.New
    0.68GB  2.10% 90.31%     2.32GB  7.16%  github.com/ethereum/go-ethereum/core.applyTransaction
    0.45GB  1.38% 91.68%     2.09GB  6.43%  github.com/syndtr/goleveldb/leveldb/table.(*Reader).find
    0.29GB  0.89% 92.58%     0.29GB  0.89%  github.com/ethereum/go-ethereum/core/vm.(*Memory).GetCopy (inline)
    0.23GB  0.72% 93.30%     0.23GB  0.72%  github.com/ethereum/go-ethereum/log.NewAsyncFileWriter

Accept-Encoding header is not present at all

Showing nodes accounting for 7001.63MB, 88.59% of 7902.98MB total
Dropped 547 nodes (cum <= 39.51MB)
Showing top 10 nodes out of 182
      flat  flat%   sum%        cum   cum%
 2056.33MB 26.02% 26.02%  2056.33MB 26.02%  github.com/VictoriaMetrics/fastcache.(*bucket).Set
 1668.07MB 21.11% 47.13%  1668.07MB 21.11%  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get
  962.64MB 12.18% 59.31%  1216.23MB 15.39%  github.com/ethereum/go-ethereum/core/vm.makeLog.func1
     823MB 10.41% 69.72%      823MB 10.41%  github.com/syndtr/goleveldb/leveldb/memdb.New
  526.22MB  6.66% 76.38%  1987.44MB 25.15%  github.com/ethereum/go-ethereum/core.applyTransaction
  329.84MB  4.17% 80.55%  1518.30MB 19.21%  github.com/syndtr/goleveldb/leveldb/table.(*Reader).find
     240MB  3.04% 83.59%      240MB  3.04%  github.com/ethereum/go-ethereum/log.NewAsyncFileWriter
  217.54MB  2.75% 86.34%   217.54MB  2.75%  github.com/ethereum/go-ethereum/core/vm.(*Memory).GetCopy (inline)
   94.71MB  1.20% 87.54%    94.71MB  1.20%  github.com/deckarep/golang-set.(*threadUnsafeSet).Add
   83.28MB  1.05% 88.59%    83.28MB  1.05%  bufio.NewWriterSize

pprof heap SVG:

leak
without_gzip

pprof heap dump:

leak.heap.gz
without_gzip.heap.gz

memory usage graph

red separator - after disabling compression
Знімок екрана 2023-08-05 о 12 30 14

config.toml

config.toml.gz

@0xLyon
Copy link

0xLyon commented Aug 7, 2023

Hi there, thanks for the details shared. We will look into this and advise when we have more details on this.

@galaio
Copy link
Contributor

galaio commented Aug 7, 2023

Hi @VladStarr , I am reproducing this problem, which may take some time. If convenient, could you please provide the goroutinue result of pprof? You can obtain it with the following command:

go tool pprof -proto -output $(date +"%Y%m%d_%H%M")_pprof_goroutinue_$(hostname).pb.gz http://127.0.0.1:6060/debug/pprof/goroutine

@galaio
Copy link
Contributor

galaio commented Aug 8, 2023

Hi @VladStarr , when using a simple rpc request, such as eth_getBlockByNumber memory changes are not obvious, and the code does not find too much problem in the application of resource release, I would like to ask a few more questions:

  1. Whether there is a timeout or a long running time in the request, because gzip will temporarily apply a lot of resources for compression, if there is a long running will occupy a large amount of memory, or there may be rpc return timeout but still executing memory.
  2. What gets the most requests? Could you give examples of several rpc requests?

@du5
Copy link

du5 commented Aug 12, 2023

same...

@galaio
Copy link
Contributor

galaio commented Aug 14, 2023

same...

@du5 Could you provide some details about the problem? like goroutine profile, heap profile, RPC msg scenarios?

@VladStarr
Copy link
Author

VladStarr commented Aug 21, 2023

Hi @galaio, the top 3 methods which node is serving are:

  • eth_call
  • eth_getLogs
  • eth_sendRawTransaction

Goroutine profile attached. Also attached fresh heap dump.

I can't tell that there are any long-running requests, the most time-consuming request is eth_sendRawTransaction but it could be some latency due to requests coming from faraway regions.

Other types of nodes (Geth/Polygon/etc) do not have this problem.

20230821_1017_pprof_goroutine.gz
20230821_1020_pprof_heap.gz

@du5
Copy link

du5 commented Sep 9, 2023

same...

@du5 Could you provide some details about the problem? like goroutine profile, heap profile, RPC msg scenarios?

lol... I don't think this problem comes from bsc and there is no way to fix it in the short term,

I reduced the memory usage by turning off gzip, https://github.com/du5/bsc-geth-disable-gzip

ethereum/go-ethereum#22567

ethereum/go-ethereum#22529

golang/go#32371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants