Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUIC Domain Name Sniffing Failure #1724

Open
4 of 5 tasks
arimitx opened this issue Apr 28, 2024 · 12 comments
Open
4 of 5 tasks

QUIC Domain Name Sniffing Failure #1724

arimitx opened this issue Apr 28, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@arimitx
Copy link

arimitx commented Apr 28, 2024

Operating system

Others

System version

NA

Installation type

Others

If you are using a graphical client, please provide the version of the client.

NA

Version

All versions that have the quic-sniffing feature are affected (until 2024/04/29)

Code version used for case study: commit 3341b62 of dev-next

Description

TL;DR

sing-box fails to parse the server_name correctly if the fragments of CRYPTO that contain ClientHello are spread across multiple packets. (See RFC9312 Section-3.4.1-7)

Bug Detail

Recently, I noticed that sing-box failed to sniff the domain names in QUIC traffic. The failure of the sniffing also leads to the failure of the subsequent domain-based routing.

In the corresponding logs, sing-box only sniffed QUIC protocol without the domain name:

INFO[0000] sing-box started (0.00s)
INFO[0014] [3780980986 0ms] inbound/tun[tun-in]: inbound packet connection from 10.10.2.11:50128
INFO[0014] [3780980986 0ms] inbound/tun[tun-in]: inbound packet connection to 34.117.186.192:443
DEBUG[0014] [3780980986 0ms] router: sniffed packet protocol: quic
INFO[0014] [3780980986 0ms] outbound/direct[direct-out]: outbound packet connection

After some study on the problem, I think the reason is that the existing sniffing mechanism for QUIC traffic fails to handle the case where the fragments of CRYPTO that contain ClientHello are spread across multiple packets.

Problem Reproduction

In my case, the fragments of CRYPTO that contain ClientHello are spread across two packets (The original pcapng file can be downloaded here):

(1 fragment in the first packet)
1

(4 fragments in the second packet)
2

To find out what happened, I extracted the payloads from two packets and wrote additional test functions regarding TestSniffQUICFragment in common/sniff/quic_test.go:

Packet 1 Payload ce0000000108f65caf297b7fdf2600404600d6b901e3b8cab2485ecf3fa3b25b6037d89673312e8835618c60a1d0729eb30c4c15d0ba53d1c520d7bf42c8c7394420317c33eb2950a3a867ec59e99aed8fe4186b14be1b44890247a5da8562947b11989eed198a380ecfd7fac8932a6728384395d362a3408f79bb89eba84ee73d75a965f1862ec67d89290e54ba22a03114b6739bd10f12dc4f24f7f66f33e76a06946a3f01e3c87646e6bdd6e9c396c9d4fa5814d9a41e79ce752bac41e9888f69f380188fd49b0d64624700c84fcdf4d91616cd2fd17bf40b37942bc692d270fb712d457626bc418e1eec88f610447516853f5646241ad119c4b5920f290e6644cd79245c0e52b54ac081b6a131eb46890fbdffbd937981fc266ece92511ae1ef2c4c03cc4a92828182498b4fc653fd8eb4aa6b142fcb23c0911e0ad49275c6b405023870d379f7d2edccc6ffd801927572a5798cc974289fadda5d1eadf57feac392bc1b5cfb6abe09c63fed96ebfa3b5183251a3d06574ef9c20898a52df23e0b85943ef6f2498b16ca237a7e387f222aa39373557a08b2acff35922a4b852ac12ce63be507df3dc7a4696d701cf85407f80627e082fccdb144c4ad702817e70e4f57deb46d78c3ae49b3b28455ae6fe0b9f204622a9e450aaa44b3122890f7b26b7833ba5576e178dd62d55d94e7328f60205a9230ac8bbd251dab29c156ea05535bec1ec24c95d2983fe529c0b1db16430e0650ea874ac85b5fafafada3f0640c501afcd120459a9434799cf9e3be64d97baa4207fac5a559ec54f721c14a2872c4a5438102399481caf3b537feb9daadaf3a5f9822471e871bae7b2dcd6cbaa006150182eeb18c8130585442e66bfe8f62b811ba5a5b0f855250199b6f0d74bf1ba275b5a2a51abe6d7456bc6ec317197ff597afcbb9881a76d4220bffca6001ef90a28e48b93aefcc4409fe85c4fb476d741f0dca24981de86e6ce0002eb2aceec7876016b6b0484984a05f8e05e20669858226e2030f475d099d1b6ecc2cb1a89764cd7ae1e5a4834ab259de5c2af8d91f512cb0035e3e5c696baab46686ab985be4c3f765b1482a75fe8b66bdbdcf6e4e8b82fb099889b79410a21de44a70967d7858e2faffba7b765a856a8a7be14915c6f8270736e1412d6ab4f177cb89fec3087bf0a576c80068d114ba1d42549d6a1c08de498127ebbff8f5ca721bba275650fb7f71bd4dde2d1d712c4cd3c6f9a3d09892a2edcebfab5d8d0ba0e16fd0c6ca4f686d5f65b3061c7d5d269ecf297e9a6c645fa6ace6faca68ad8372e0591da7e31bdb79f68041b590b5f844a73d2c12de456fc19e417a46571997710dcc27e533a5b7e00b3d7083aadcfabf1f9b8602f83569f6f021b2b199a83a885c9b6cfd2c2bbf24d107a2154842a6e7fef8c7fc4e6c2ff1f83a6ac186ce374574072bbc723da3807097eda508190a7c8208c71111f43980b4923b51d6229ac290729e959093dfeeb4f38e9ae0216ccf74a0071003288479dc87a6a9c44742714868c43181f3373e8c19b0586f635caa2ce64628f2b90d3620c950ed14be118ec87f4c470c60be0cff97555be3e845a7f12b91071c1d2cf09c12c34c190ac8f7c4359b8c22590883b44fad0f23bacda4fe31107a72f8cfb1184c5c312331b03f2a3a6e6ac44cba188ab1cab6513c362e1436d2d19337344832344e577cc1710754ba3cb1004106bd958df0ef53f9103745a7774895891b2da5fb90693096a3
Packet 2 Payload c30000000108f65caf297b7fdf2600404600d6b901e3b8cab2485ecf3fa3b25b6037d89673312e8835618c60a1d0729eb30c4c15d0ba53d1c520d7bf42c8c7394420317c33eb2950a3a867ec59e99aed8fe4186b14be1b4489e2b4ec270295367801d969e0f806b612a13b34edf8412c6c056d993c33e5d04a97b70a6243edad3e5a0948b88a66be38110145c7ab69af25298c1886bd68ba03fe476ba2fda0b3d42e8a93827dbaf064226403041fb1ab2ac328b160114f2a99ccc216152e1393339f2e96ffa615316dd85c1626928b55ea09b8bf4b64dcf6573cecd6fd04645a2d61c55f913796bd165fa60dd9742611d2cee37c712667efece55ff1135a428837b2fc3657b6e5c49f5adeddcdd1c0829a3165728953bfdd028872ed814d80d0825f9a8e7fd73aa90e8584869376052ed51efcd4d112d99c4c30362da4cd0f68cdedf6003646b409eac756c8be55a81edb0dca6d1e875a41cc9033ae00eb43345fc5020731108871234234fe8ab2dff2264744c7b2e2e991031321b70c6165eba736455b142755a8612f76a88d6fce6c090418709c350e3ca1ac536707526e171ea1a6a69daa6b887d589fa9f08f26dfa99d90c9ca8427e66acc94c36e8c08cc252062efaf4b6e1f5aadde68649329675873b145260fd961bef66a64733dc914c1d90501474a0d061ef8c09372f57bcaca9db6493308c4555c569c62ee30c6181fd3736c1c19bcf4d6c5e6b21e1d869c765de668d3c7f5e4b5c04a31c696907b5f459bdbd2666c33b1bba01fdfa1ef6418348c23201b7ad0a97ead75577c09963691a55357b8587523f40355f30669ce95ad2d24637fcbf9fb02e3e4134c5ed8eefb6b2d2e456bfc42a4be697af4b50c4719d55c09029ab284bc8be098bab030c1cfea142e506fb566dcc525a3c72b8cb5b561fec1ca9fe9e78318c38ad18e3b105423dd8330f9d5962cf169c7f12b8ac1ffeca5be1df5071423e17cca0e1b3bef3c422ee91bc3aefc8ce840b0d8e4c478806dd985a17618982490d7087248b7a82ebd84aa97074cfeb21f88a3204797c465eb74f0efae3c7be990d667dedf5f2b9ebc82fe9690b261d512f0ce6dd09bc9118b484526de52ac55cf715f1a2f426b3daaa8d00dd7adb534ba22860e80544b7d050f616ce8b2b14929c2559b82404de9fb13c8b5f53a0c6e560a7d346e7d1db096400f61dc7bbbb19d67c1e581ef8529d6fc29eb8586b777348ba220692ff6be90aabf3ad6831cd9b4ce6d5de6992dedc7743dbd28605e186c34f2db6407b9f3811ab7f5fe3964f58418357183e073a7cbab7ab2bdb3fe99fadadb43a2448618150f5f415e8d61b00220949093b9ff65ec694836183239fe3d932eae16984e5b508ba213882b49b5f8e152541b0f9d88d1c00d872893783cee0f53827d894202345237adf635335095c9d1d650b3e267fe5b581f687423060387dd597aa10edc51e42e330010af244416e6659a044e3f00aa700e1a2115760b19deced9fed8d773334a0bfb32c406683b698b4aa536c0947df4b35f19430929e8a115eba92f8f797ca7e37e0c902f77c1fbdc80ad8fc6df411dcb2e28d331ef53be038796f4fd0def0d686d581d995fc6baf420492552eb11f5a191d429175d6f0fad7a0d13fdc913bfd2ce99f8b51f69c009e2a01742f677c3f6e296e34f55dda559d6024f68bceb765fc4a8a1e271574c64d2495ab3840f37cbb0ab8325c6578530d8eea51cf8704306774f92331f7d1085e7d45f30

For Payload 1, there is an EOF error (quic.go line 299):

GOROOT=/usr/local/go #gosetup
GOPATH=/home/jason/go #gosetup
/usr/local/go/bin/go test -c -o /home/jason/.cache/JetBrains/GoLand2024.1/tmp/GoLand/___TestSniffQUICFragment2_in_github_com_sagernet_sing_box_common_sniff.test github.com/sagernet/sing-box/common/sniff #gosetup
/usr/local/go/bin/go tool test2json -t /home/jason/.cache/JetBrains/GoLand2024.1/tmp/GoLand/___TestSniffQUICFragment2_in_github_com_sagernet_sing_box_common_sniff.test -test.v -test.paniconexit0 -test.run ^\QTestSniffQUICFragment2\E$
=== RUN   TestSniffQUICFragment2
=== PAUSE TestSniffQUICFragment2
=== CONT  TestSniffQUICFragment2
    quic_test.go:37: 
        	Error Trace:	/home/jason/codes/sing-box/common/sniff/quic_test.go:37
        	Error:      	Received unexpected error:
        	            	EOF
        	Test:       	TestSniffQUICFragment2
--- FAIL: TestSniffQUICFragment2 (0.00s)

FAIL

Process finished with the exit code 1

For Payload 2, there is a bad fragments error (quic.go line 295):

GOROOT=/usr/local/go #gosetup
GOPATH=/home/jason/go #gosetup
/usr/local/go/bin/go test -c -o /home/jason/.cache/JetBrains/GoLand2024.1/tmp/GoLand/___1TestSniffQUICFragment3_in_github_com_sagernet_sing_box_common_sniff.test github.com/sagernet/sing-box/common/sniff #gosetup
/usr/local/go/bin/go tool test2json -t /home/jason/.cache/JetBrains/GoLand2024.1/tmp/GoLand/___1TestSniffQUICFragment3_in_github_com_sagernet_sing_box_common_sniff.test -test.v -test.paniconexit0 -test.run ^\QTestSniffQUICFragment3\E$
=== RUN   TestSniffQUICFragment3
=== PAUSE TestSniffQUICFragment3
=== CONT  TestSniffQUICFragment3
    quic_test.go:46: 
        	Error Trace:	/home/jason/codes/sing-box/common/sniff/quic_test.go:46
        	Error:      	Received unexpected error:
        	            	bad fragments
        	Test:       	TestSniffQUICFragment3
--- FAIL: TestSniffQUICFragment3 (0.00s)

FAIL

Process finished with the exit code 1

Conclusion

Obviously, the current sniffing mechanism for QUIC traffic is insufficient to cope with the case where the CRYPTO fragments containing ClientHello are spread across multiple packets.

A temporary measure could be adding an extra rule to block all QUIC traffic, and I am also investigating the codes to find a possible solution. However, I am not familiar with either Go or the code base of sing-box. I would appreciate it if you could fix the problem or provide me with some suggestions.

Reproduction

The bug can be reproduced using the payloads in the bug description section.

Logs

INFO[0000] sing-box started (0.00s)
INFO[0014] [3780980986 0ms] inbound/tun[tun-in]: inbound packet connection from 10.10.2.11:50128
INFO[0014] [3780980986 0ms] inbound/tun[tun-in]: inbound packet connection to 34.117.186.192:443
DEBUG[0014] [3780980986 0ms] router: sniffed packet protocol: quic
INFO[0014] [3780980986 0ms] outbound/direct[direct-out]: outbound packet connection

Supporter

Integrity requirements

  • I confirm that I have read the documentation, understand the meaning of all the configuration items I wrote, and did not pile up seemingly useful options or default values.
  • I confirm that I have provided the server and client configuration files and process that can be reproduced locally, instead of a complicated client configuration file that has been stripped of sensitive data.
  • I confirm that I have provided the simplest configuration that can be used to reproduce the error I reported, instead of depending on remote servers, TUN, graphical interface clients, or other closed-source software.
  • I confirm that I have provided the complete configuration files and logs, rather than just providing parts I think are useful out of confidence in my own intelligence.
@arimitx
Copy link
Author

arimitx commented Apr 28, 2024

@nekohasekai nekohasekai added the bug Something isn't working label Apr 29, 2024
@dyhkwong
Copy link
Contributor

dyhkwong commented Apr 29, 2024

Unlike a firewall which only needs to decide to block traffic or not (so it can block traffic at the second packet), a proxy needs to decide which outbound to route traffic to based on only the first packet (if protocol itself is 0-rtt). It will cause issues if a proxy "holds" the first packet and waits for the second (or even more) packet to arrive. So I don't think this is fixable.

@arimitx
Copy link
Author

arimitx commented Apr 29, 2024

Unlike a firewall which only needs to decide to block traffic or not (so it can block traffic at the second packet), a proxy needs to decided which outbound to route the packet to based on only the first packet (if protocol itself is 0-rtt). It will cause issues if a proxy "holds" the first packet and waits for the second packet to arrive. So I don't think this is fixable.

I think there might be some recent changes in chromium that actually modify the behavior of the browsers (e.g., chrome 124) when sending ClientHello in QUIC. From my perspective, the domain-based routing mechanism suddenly fails unexpectedly without any modification to the configurations or sing-box itself. Sadly, I think I am not the only one who suffers from it (even though no other user reports similar issues).

The part that really gets our hands sticky is that domain sniffing is an important feature for QUIC traffic. However, new RFCs make QUIC working like stream so that we can’t decide the routing policy simply using the first packet. Even though spreading ClientHello over multiple fragments in multiple packets helps avoiding protocol ossification, it does cause many problems for the “middle box” to handle the network traffic properly.

@dyhkwong
Copy link
Contributor

dyhkwong commented Apr 29, 2024

Sniffing itself is actually censorship. It makes sense for new standards to add some anti-censorship features (like making traffic more difficult to sniff). And how can a proxy holds UDP packets and waits a full SNI that may not even exist? The only way is to set timeout for waiting the second packet, if timeout and/or no SNI sniffed then route the first packet as is.

@arimitx
Copy link
Author

arimitx commented Apr 29, 2024

Sniffing itself is actually censorship. It makes sense for new standards to add some anti-censorship features (like making traffic more difficult for to sniff). And how can a proxy holds UDP packets and waits a full SNI that may not even exist? The only way is to set timeout for waiting the second packet, if timeout and/or no SNI sniffed then route the first packet as is.

I agree. From an end user perspective, it’s also a choice to simply block out all QUIC traffic on sing-box. The network performance may even get better if the connection quality for UDP is not good (common case).

@dyhkwong
Copy link
Contributor

dyhkwong commented Apr 29, 2024

From an end user perspective, it’s also a choice to simply block out all QUIC traffic on sing-box.

Fake DNS should still work as long as no encrypted DNS is used.

But if ECH is promoted one day, no domain name can be sniffed.

@arimitx
Copy link
Author

arimitx commented Apr 29, 2024

Fake DNS should still work as long as no encrypted DNS is used.

Thanks for the suggestion. Unfortunately, fake-ip is not an option in my case. I run sing-box on my router, and fake-ip would break the functionality of policy-based routing of other applications.

But if ESNI is promoted one day, no domain name can be sniffed.

Then let’s go back to the old good days when people were using browser plugins like SwitchyOmega for traffic splitting : )

@arimitx

This comment was marked as off-topic.

@dyhkwong
Copy link
Contributor

dev-next...dyhkwong:sing-box:feature/fix-quic-sniffer
This is a very preliminary PoC for testing purpose only. I don't know if bad things will happen.

@arimitx
Copy link
Author

arimitx commented May 24, 2024

dev-next...dyhkwong:sing-box:feature/fix-quic-sniffer This is a very preliminary PoC for testing purpose only. I don't know if bad things will happen.

Many thanks! I will try it later.

@arimitx
Copy link
Author

arimitx commented May 27, 2024

dev-next...dyhkwong:sing-box:feature/fix-quic-sniffer This is a very preliminary PoC for testing purpose only. I don't know if bad things will happen.

@dyhkwong Thanks again for your kind help!

I've created a fork of dev-next branch of sing-box with your pull request merged. Then, I compiled and tested sing-box on a Windows 11 machine with the following minimum config:

{
    "log": {
        "level": "debug"
    },
    "dns": {
        "servers": [
            {
                "tag": "google",
                "address": "8.8.8.8",
                "strategy": "prefer_ipv4"
            }
        ],
        "final": "google"
    },
    "inbounds": [
        {
            "type": "tun",
            "tag": "tun-in",
            "interface_name": "tun0",
            "inet4_address": "172.19.0.1/30",
            "mtu": 1280,
            "gso": false,
            "auto_route": true,
            "strict_route": false,
            "endpoint_independent_nat": false,
            "udp_timeout": "5m",
            "stack": "system",
            "sniff": true
        }
    ],
    "outbounds": [
        {
            "type": "direct",
            "tag": "direct-out"
        },
        {
            "type": "dns",
            "tag": "dns-out"
        }
    ],
    "route": {
        "auto_detect_interface": true,
        "rules": [
            {
                "protocol": "dns",
                "outbound": "dns-out"
            }
        ],
        "final": "direct-out"
    }
}

I can see from the logs that sing-box has successfully sniffed various domain names in quic sessions. For example:

INFO[0034] [4046528588 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:62020
INFO[0034] [4046528588 0ms] inbound/tun[tun-in]: inbound packet connection to 142.251.220.68:443
DEBUG[0034] [4046528588 0ms] router: sniffed packet protocol: quic
DEBUG[0034] [4046528588 0ms] router: sniffed packet protocol: quic, domain: www.google.com
INFO[0025] [3854301593 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:56775
INFO[0025] [3854301593 0ms] inbound/tun[tun-in]: inbound packet connection to 142.250.204.142:443
DEBUG[0025] [3854301593 0ms] router: sniffed packet protocol: quic
DEBUG[0025] [3854301593 1ms] router: sniffed packet protocol: quic, domain: www.youtube.com
INFO[0033] [3563755895 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:59726
INFO[0033] [3563755895 0ms] inbound/tun[tun-in]: inbound packet connection to 34.117.186.192:443
DEBUG[0033] [3563755895 0ms] router: sniffed packet protocol: quic
DEBUG[0033] [3563755895 0ms] router: sniffed packet protocol: quic, domain: ipinfo.io
INFO[0033] [3213668466 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:51200
INFO[0033] [3213668466 0ms] inbound/tun[tun-in]: inbound packet connection to 104.22.31.153:443
DEBUG[0033] [3213668466 0ms] router: sniffed packet protocol: quic
DEBUG[0033] [3213668466 0ms] router: sniffed packet protocol: quic, domain: myip.ipip.net

Maybe more tests are required to inspect the quic sniffing feature, but from my perspective it seems to work fine now.

@arimitx
Copy link
Author

arimitx commented May 27, 2024

dev-next...dyhkwong:sing-box:feature/fix-quic-sniffer This is a very preliminary PoC for testing purpose only. I don't know if bad things will happen.

@dyhkwong Thanks again for your kind help!

I've created a fork of dev-next branch of sing-box with your pull request merged. Then, I compiled and tested sing-box on a Windows 11 machine with the following minimum config:

{
    "log": {
        "level": "debug"
    },
    "dns": {
        "servers": [
            {
                "tag": "google",
                "address": "8.8.8.8",
                "strategy": "prefer_ipv4"
            }
        ],
        "final": "google"
    },
    "inbounds": [
        {
            "type": "tun",
            "tag": "tun-in",
            "interface_name": "tun0",
            "inet4_address": "172.19.0.1/30",
            "mtu": 1280,
            "gso": false,
            "auto_route": true,
            "strict_route": false,
            "endpoint_independent_nat": false,
            "udp_timeout": "5m",
            "stack": "system",
            "sniff": true
        }
    ],
    "outbounds": [
        {
            "type": "direct",
            "tag": "direct-out"
        },
        {
            "type": "dns",
            "tag": "dns-out"
        }
    ],
    "route": {
        "auto_detect_interface": true,
        "rules": [
            {
                "protocol": "dns",
                "outbound": "dns-out"
            }
        ],
        "final": "direct-out"
    }
}

I can see from the logs that sing-box has successfully sniffed various domain names in quic sessions. For example:

INFO[0034] [4046528588 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:62020
INFO[0034] [4046528588 0ms] inbound/tun[tun-in]: inbound packet connection to 142.251.220.68:443
DEBUG[0034] [4046528588 0ms] router: sniffed packet protocol: quic
DEBUG[0034] [4046528588 0ms] router: sniffed packet protocol: quic, domain: www.google.com
INFO[0025] [3854301593 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:56775
INFO[0025] [3854301593 0ms] inbound/tun[tun-in]: inbound packet connection to 142.250.204.142:443
DEBUG[0025] [3854301593 0ms] router: sniffed packet protocol: quic
DEBUG[0025] [3854301593 1ms] router: sniffed packet protocol: quic, domain: www.youtube.com
INFO[0033] [3563755895 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:59726
INFO[0033] [3563755895 0ms] inbound/tun[tun-in]: inbound packet connection to 34.117.186.192:443
DEBUG[0033] [3563755895 0ms] router: sniffed packet protocol: quic
DEBUG[0033] [3563755895 0ms] router: sniffed packet protocol: quic, domain: ipinfo.io
INFO[0033] [3213668466 0ms] inbound/tun[tun-in]: inbound packet connection from 172.19.0.1:51200
INFO[0033] [3213668466 0ms] inbound/tun[tun-in]: inbound packet connection to 104.22.31.153:443
DEBUG[0033] [3213668466 0ms] router: sniffed packet protocol: quic
DEBUG[0033] [3213668466 0ms] router: sniffed packet protocol: quic, domain: myip.ipip.net

Maybe more tests are required to inspect the quic sniffing feature, but from my perspective it seems to work fine now.

Unfortunately, the application crashes when I add domain-based routing rules to the configuration.

The call stack information is attached here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants