Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream memory leak #373

Open
lluu131 opened this issue Dec 29, 2023 · 15 comments
Open

upstream memory leak #373

lluu131 opened this issue Dec 29, 2023 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@lluu131
Copy link

lluu131 commented Dec 29, 2023

image
QUIC upstream

image
UDP upstream

For the same configuration, the memory footprint of QUIC upstream is very high and constantly increasing, but UDP upstream is very low, both without caching

@lluu131
Copy link
Author

lluu131 commented Dec 29, 2023

image
2 hours later

@EugeneOne1 EugeneOne1 self-assigned this Jan 16, 2024
@EugeneOne1
Copy link
Member

EugeneOne1 commented Jan 16, 2024

@lluu131, hello and thanks for the thorough report. Unfortunately, we can't reproduce the leak. It would really help us to troubleshoot this issue if you could collect a goroutines profile for us.

To perform that, restart the dnsproxy service with profiling enabled. To enable it, use the --pprof CLI option, or set pprof: true in the YAML configuration file. When the memory grows to the suspicious level again, use the following command:

curl "http://127.0.0.1:6060/debug/pprof/goroutine?debug=1" > profile.txt

Or just follow the "http://127.0.0.1:6060/debug/pprof/goroutine?debug=1" URL with your web browser.

Note that profiles could only be accessed on the same host machine.

You can send the resulting profile to our devteam@adguard.com.

@lluu131

This comment was marked as outdated.

@lluu131 lluu131 reopened this Jan 22, 2024
@lluu131
Copy link
Author

lluu131 commented Jan 22, 2024

屏幕截图 2024-01-22 205438
Debug re-collected, just found out that memory increases massively when the quic server network is unreachable, but doesn't free up or diminish when it recovers

@lluu131
Copy link
Author

lluu131 commented Jan 22, 2024

@EugeneOne1 EugeneOne1 Profile.txt has been sent by e-mail

@lluu131
Copy link
Author

lluu131 commented Jan 22, 2024

@lluu131, hello and thanks for the thorough report. Unfortunately, we can't reproduce the leak. It would really help us to troubleshoot this issue if you could collect a goroutines profile for us.

To perform that, restart the dnsproxy service with profiling enabled. To enable it, use the --pprof CLI option, or set pprof: true in the YAML configuration file. When the memory grows to the suspicious level again, use the following command:

curl "http://127.0.0.1:6060/debug/pprof/goroutine?debug=1" > profile.txt

Or just follow the "http://127.0.0.1:6060/debug/pprof/goroutine?debug=1" URL with your web browser.

Note that profiles could only be accessed on the same host machine.

You can send the resulting profile to our devteam@adguard.com.

Profile.txt has been sent by e-mail

adguard pushed a commit that referenced this issue Jan 23, 2024
Updates #373.

Squashed commit of the following:

commit 0632b4f
Author: Eugene Burkov <E.Burkov@AdGuard.COM>
Date:   Tue Jan 23 16:21:41 2024 +0300

    upstream: imp code, logging

commit cea34d5
Author: Eugene Burkov <E.Burkov@AdGuard.COM>
Date:   Tue Jan 23 15:50:53 2024 +0300

    upstream: use mutex. imp logging
@EugeneOne1
Copy link
Member

EugeneOne1 commented Jan 23, 2024

@lluu131, hello again. Thank you for your help, the profile clarified the issue for us. We've pushed the patch (v0.63.1) that may improve the situation. Could you please check if it does?

If the issue persists, wouldn't you mind to collect the profile again? We'd also like to take a look at the verbose log (verbose: true in YAML configuration) if it's possible to collect it.

@lluu131
Copy link
Author

lluu131 commented Jan 24, 2024

@lluu131, hello again. Thank you for your help, the profile clarified the issue for us. We've pushed the patch (v0.63.1) that may improve the situation. Could you please check if it does?

If the issue persists, wouldn't you mind to collect the profile again? We'd also like to take a look at the verbose log (verbose: true in YAML configuration) if it's possible to collect it.

Already done with client and server updates, I noticed from verbose that the client is requesting root dns every second, is this normal??

image

@lluu131
Copy link
Author

lluu131 commented Jan 24, 2024

@lluu131, hello again. Thank you for your help, the profile clarified the issue for us. We've pushed the patch (v0.63.1) that may improve the situation. Could you please check if it does?

If the issue persists, wouldn't you mind to collect the profile again? We'd also like to take a look at the verbose log (verbose: true in YAML configuration) if it's possible to collect it.

Tested for a few hours, memory increases after quic upstream interruptions, memory stops increasing after upstream resumes (but won't be freed), some improvement compared to the previous constant increase, but there is still a problem, the relevant logs were sent via email

@lluu131
Copy link
Author

lluu131 commented Jan 24, 2024

It looks worse.

image

@EugeneOne1
Copy link
Member

@lluu131, we've received the data. Thank you for your help.

@EugeneOne1 EugeneOne1 added bug Something isn't working and removed waiting for repro labels Jan 24, 2024
@EugeneOne1
Copy link
Member

@lluu131, we've been investigating some unusual concurrency patterns used in the DNS-over-QUIC code, and found that the dependency responsible for handling QUIC protocol probably contains the bug (quic-go/quic-go#4303). Anyway, we should come up with some workaround in the meantime.

@Lyoko-Jeremie
Copy link

Lyoko-Jeremie commented Feb 13, 2024

图片

cost 10G after running 66 day

图片

this machine only run all my dns server.

the config is :


[Unit]
Description=dnsproxy Service
Requires=network.target
After=network.target

[Service]
Type=simple
User=jeremie
Restart=always
AmbientCapabilities=CAP_NET_BIND_SERVICE
ExecStart=/usr/bin/dnsproxy -l  0.0.0.0 -p 5353 \
                                                   --all-servers \
                                                   -f tls://1.1.1.1 \
                                                   -u sdns://AgcAAAAAAAAABzEuMC4wLjGgENk8mGSlIfMGXMOlIlCcKvq7AVgcrZxtjon911-ep0cg63Ul-I8NlFj4GplQGb_TTLiczclX57DvMV8Q-JdjgRgSZG5zLmNsb3VkZmxhcmUuY29tCi9kbnMtcXVlcnk \
                                                   -f https://1.1.1.1/dns-query \
                                                   -u https://1.0.0.1/dns-query \
                                                   -u https://dns.google/dns-query \
                                                   -u https://1.0.0.1/dns-query \
                                                   -u https://mozilla.cloudflare-dns.com/dns-query \
                                                   -u https://dns11.quad9.net/dns-query \
                                                   -u https://dns10.quad9.net/dns-query \
                                                   -u https://dns.quad9.net/dns-query \
                                                   --http3 \
                                                   --bootstrap=1.0.0.1:53



[Install]
WantedBy=multi-user.target

@EugeneOne1

@Ir1Ka
Copy link

Ir1Ka commented May 8, 2024

I've observed a memory leak issue in my home environment.
I am using DoH. When I configured a wrong DoH API URL, the system reported an out of memory error of the dnsproxy process.

I am using the docker version of adguard/dnsproxy.

@Ir1Ka
Copy link

Ir1Ka commented May 9, 2024

Update:There are many query errors in my log. It seems that when an upstream query error occurs (such as the network is temporarily unavailable), the memory will increase until out of memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants