Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support edns-tcp-keepalive EDNS0 Option - RFC7828 #3778

Closed
3 tasks done
rskallies opened this issue Oct 28, 2021 · 16 comments
Closed
3 tasks done

Support edns-tcp-keepalive EDNS0 Option - RFC7828 #3778

rskallies opened this issue Oct 28, 2021 · 16 comments
Assignees
Labels
bug external libs Issues that require changes in external libraries. P3: Medium
Milestone

Comments

@rskallies
Copy link

rskallies commented Oct 28, 2021

  • I am running the latest edge version
  • I checked the documentation and found no answer
  • I checked to make sure that this issue has not already been filed

Problem Description

Using latest stubby / getdns as DOT client is not possible without running stubby with the option idle_timeout: 0 because
on the AdGuard server side the edns-tcp-keepalive EDNS0 Option seems not to be parsed. Instead the server logs
[info] error handling TCP packet: dns: buffer size too small

Proposed Solution

The project already use the https://github.com/miekg/dns/ Go library already which supports this feature.
It would be nice either to support this also or at least emit a proper error message if such option is requested by a client and the server cannot handle this feature.

It took me a while to get this figured out and to get a Stubby client working.

@Harvester57
Copy link

It took me a long time to find this post and the solution to my problem: using DoT on an Asus router with Merlin firmware, I would always hit the same problem you described, where pointing to NextDNS works flawlessly.

Upon implementing your solution, I was able to restore the connection with my AdGuard server. So first, thank you for your post, and second, I concur with your proposition to add the support for edns-tcp-keepalive !

@gspannu
Copy link

gspannu commented Dec 11, 2021

Using latest stubby / getdns as DOT client is not possible without running stubby with the option idle_timeout: 0 because on the AdGuard server side the edns-tcp-keepalive EDNS0 Option seems not to be parsed. Instead the server logs [info] error handling TCP packet: dns: buffer size too small

Proposed Solution

The project already use the https://github.com/miekg/dns/ Go library already which supports this feature. It would be nice either to support this also or at least emit a proper error message if such option is requested by a client and the server cannot handle this feature.

It took me a while to get this figured out and to get a Stubby client working.

It took me a long time to find this post and the solution to my problem: using DoT on an Asus router with Merlin firmware, I would always hit the same problem you described, where pointing to NextDNS works flawlessly.

Upon implementing your solution, I was able to restore the connection with my AdGuard server. So first, thank you for your post, and second, I concur with your proposition to add the support for edns-tcp-keepalive !

@rskallies @Harvester57

I am running AdGuard Home on a VPS and Asus RTAX88U running Merlin firmware 386.3.2

I am unable to get AdGuard Home working on Asus Router when using Beta/Edge version (while NextDNS works without any issue).

1) Could you please elaborate how you fixed this issue? What changes did you make on the Asus Router? Which files/ scripts to edit/ add?


  1. In addition, the only way I can get Asus to connect to AdGuard home is if I put in the full address in the DNS-Privacy like attached.

Screenshot 2021-12-11 at 03 59 10 am

Do you have the same issue?

@Harvester57
Copy link

Hi @gspannu,

I created the post-conf script for Stubby (/jffs/scripts/stubby.postconf) with the following content:

#!/bin/sh
CONFIG=$1
source /usr/sbin/helper.sh

pc_replace "edns_client_subnet_private: 1" "edns_client_subnet_private: 0" $CONFIG
pc_replace "idle_timeout: 9000" "idle_timeout: 0" $CONFIG

You don't need the edns_client_subnet_private: 1 line if you don't intend to use ECS. The idle_timeout: 0 is the only parameter you need to change.

Do not forget to add the execute bit to your script: chmod +x /jffs/scripts/stubby.postconf and then you can restart the Stubby service.

P.S. : I assume you already have a USB key and external JFFS scripts support enabled, and that you can connect to your router through SSH

@gspannu
Copy link

gspannu commented Dec 13, 2021

Hi @gspannu,
/* snip
*/

Thanks for your response.

Another quick query if could help me.

  • I have hosted my AdGuard Home on a VPS and
  • am using my own self-signed certificates for encryption settings.

AdGuard complains about chain of trust when adding the certificate, but if I copy my self-signed/ self-generated RootCA to /etc/ssl/certs path, then the self signed certificate is accepted by AdGuardHome.

To test that my self signed cert is actually working on DoT/ DoH...

  • I used the fabulous tool dnslookup (by Ameshkov) tool from another machine and executed some DoT/ DoH queries.
  • These queries fail because the client machine does not have the RootCA.
  • Again, copying the RootCA to the client Machines /etc/ssl/certs folder works
  • and all queries work now and are received by AdGuard Home as encrypted.
  • tested for both Dot & DoH queries - all good so far from another machine to my AdGuard Home VPS

Q: How do I copy this RootCA to Asus Router?
If I try and scp the file to /etc/certs... it fails with read-only error.

@Harvester57
Copy link

You can copy you cert in your /jffs partition (for example /jffs/mycert/mycert.pem) and use a binding mount with the /etc/ssl directory :

mount -o bind /jffs/mycert/mycert.pem /etc/ssl/certs/mycert.pem

You should now see your cert in /etc/ssl/certs.

You can automatically do this during the router boot phase, by editing the file /jffs/scripts/services-start, and by adding the previous line in it.

@gspannu
Copy link

gspannu commented Dec 13, 2021

You can copy you cert in your /jffs partition (for example /jffs/mycert/mycert.pem) and use a binding mount with the /etc/ssl directory :

mount -o bind /jffs/mycert/mycert.pem /etc/ssl/certs/mycert.pem

You should now see your cert in /etc/ssl/certs.

You can automatically do this during the router boot phase, by editing the file /jffs/scripts/services-start, and by adding the previous line in it.

You are a star.... Thanks.

@ameshkov
Copy link
Member

The project already use the https://github.com/miekg/dns/ Go library already which supports this feature.

Not really: miekg/dns#1317

@ainar-g if the PR does not get merged until we release AGH, I suggest adding a replace to go.mod.

@rskallies
Copy link
Author

The project already use the https://github.com/miekg/dns/ Go library already which supports this feature.

Not really: miekg/dns#1317

@ameshkov Thank you for digging even deeper and for creating a PR to upstream. Spotting / fixing this exceeded my skills. 😄

@ainar-g if the PR does not get merged until we release AGH, I suggest adding a replace to go.mod.

👍

@ainar-g ainar-g added this to the v0.107.0 milestone Dec 15, 2021
@ainar-g ainar-g added bug external libs Issues that require changes in external libraries. P3: Medium and removed feature request labels Dec 15, 2021
@ainar-g ainar-g self-assigned this Dec 15, 2021
@ainar-g
Copy link
Contributor

ainar-g commented Dec 15, 2021

@rskallies, the latest edge build includes Andrey's version of the fix. Can you check if that fixes your issue?

@rskallies
Copy link
Author

@rskallies, the latest edge build includes Andrey's version of the fix. Can you check if that fixes your issue?

Yes it does. 👍
Stubby connects successful when setting the (default) value idle_timeout to 10000 again.

[13:30:58.740760] STUBBY: 95.xxx.xxx.xxx  : Conn opened: TLS - Strict Profile

[13:30:58.947467] STUBBY: 95.xxx.xxx.xxx  : Verify passed : TLS

[13:31:09.151213] STUBBY: 95.xxx.xxx.xxx  : Conn closed: TLS - Resps=     1, Timeouts  =     0, Curr_auth =Success, Keepalive(ms)= 10000

[13:31:09.151375] STUBBY: 95.xxx.xxx.xxx  : Upstream   : TLS - Resps=     1, Timeouts  =     0, Best_auth =Success

[13:31:09.151451] STUBBY: 95.xxx.xxx.xxx  : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      0, Backoffs     =     0

Still wonder why dnsproxy does not connect via QUIC - would prefer to use Adguard dnsproxy instead of Stubby on this MIPS based router. I'll create another issue with more information later.

@ainar-g
Copy link
Contributor

ainar-g commented Dec 15, 2021

Thanks for testing! I'll close this issue then. I've left a TODO in the code to switch back to the original library once the PR is merged there.

@ainar-g ainar-g closed this as completed Dec 15, 2021
@ameshkov
Copy link
Member

Still wonder why dnsproxy does not connect via QUIC

This one is strange. Do you specify the port number when running dnsproxy?

@rskallies
Copy link
Author

Yes -u quic://fully.qualified.domain:784 , since using dnsproxy a long time on x86_64 and arm64 devices which defaulted to port 784 then. I already tested using various different ports in case of a middleware is blocking something but still no success. Using the same config / arguments for dnsproxy from an x86 or arm64 device which is behind the affected MIPS / OpenWRT based router works like a charm. Only if running dnsproxy directly on the router is causing this. I also disabled any firewall rules for testing purposes on the router but no success so far.

Log on the Adguard Server is showing

"got error when accepting a new QUIC stream: timeout: no recent network activity"

Log on the client (dnsproxy v0.39.13) is showing

"[debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: failed to open QUIC session to quic://fully.qualified.domain:784, cause: timeout: handshake did not complete in time"

Seems I need to enable debug on the server and see what is logged then.

@ameshkov
Copy link
Member

Hm, it looks as if something is wrong with UDP to port 784 in general. As if it's dropping packets somehow.

@rskallies
Copy link
Author

I disabled any firewalling on both sides, changed UDP port with no success. And interestingly it also does not work with DoT. Using same dnsproxy version from a client behind the router does work for both DoQ / Quic and DoT for the same AdGuard server. Really strange.

I'll try to debug this using tcpdump on both sides.

@gspannu
Copy link

gspannu commented Dec 15, 2021

@rskallies, the latest edge build includes Andrey's version of the fix. Can you check if that fixes your issue?

Confirmed. The issue is now fixed in v0.107.0-b.16. 👍

Just tested and removed the post.conf script for Stubby in Asus Merlin Router, all works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug external libs Issues that require changes in external libraries. P3: Medium
Projects
None yet
Development

No branches or pull requests

5 participants