Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use golang.org/x/crypto/cryptobyte to unpack DNS records #1507

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

tmthrgd
Copy link
Collaborator

@tmthrgd tmthrgd commented Nov 6, 2023

This substantially simplifies the machinery and state tracking around DNS record unpacking. Gone are the off int arguments! Gone are the manual slice length checks!

For the most part care has been taken to maintain existing behaviour, including quirks, with one of the few notable intentional change being to UnpackDomainName where we now strictly enforce that pointers point backwards into the message. We also fix a bug here or there (see unpackQuestion) and are more strict in our unpacking of certain EDNS0 options.

This substantially simplifies the machinery and state tracking around
DNS record unpacking. Gone are the `off int` arguments! Gone are the
manual slice length checks!

Care has been taken to maintain existing behaviour, including quirks,
with the only notable intentional change being to UnpackDomainName
where we now strictly enforce that pointers point backwards into the
message.
@tmthrgd tmthrgd requested a review from miekg as a code owner November 6, 2023 11:46
Copy link
Owner

@miekg miekg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holy moly this cleans stuff up.

one comment regarding some docs

msg_helpers.go Outdated Show resolved Hide resolved
Instead of using a custom type we're just going to pass the full msg
byte-slice down the call stack. This makes this substantially simpler
and allows for some nice optimisations and cleanups.
msg.go Outdated Show resolved Hide resolved
This unpacking is more strict than previous. It now validates that no
trailing data exists.
@tmthrgd
Copy link
Collaborator Author

tmthrgd commented Nov 7, 2023

holy moly this cleans stuff up.

You know I thought it would, hence the PR, but even I was surprised just how much it improves things.

msg.go Outdated Show resolved Hide resolved
msg.go Outdated Show resolved Hide resolved
This is ugly, but there doesn't really seem to be a way to get around
that. The previous code was visually elegant but allocated far too much.
This code is efficient but visually ugly. *sigh*

I'm not sure if `msg = &b` causing b to escape is a missed optimisation
in the compiler or not, but with go1.21 that's how it is.

Also adjust our end-of-name check slightly.

name                                   old time/op    new time/op    delta
UnpackDomainName-12                       152ns ± 6%     148ns ±13%     ~     (p=0.353 n=10+10)
UnpackDomainNameUnprintable-12            120ns ±12%     122ns ± 9%     ~     (p=0.616 n=10+10)
UnpackDomainNameLongest-12                618ns ±11%     558ns ± 9%   -9.83%  (p=0.000 n=10+9)
UnpackDomainNameLongestUnprintable-12    1.60µs ±16%    1.55µs ±13%     ~     (p=0.447 n=10+9)
UnpackDomainNamePointers-12               302ns ±17%     149ns ± 6%  -50.51%  (p=0.000 n=10+10)

name                                   old alloc/op   new alloc/op   delta
UnpackDomainName-12                       64.0B ± 0%     64.0B ± 0%     ~     (all equal)
UnpackDomainNameUnprintable-12            48.0B ± 0%     48.0B ± 0%     ~     (all equal)
UnpackDomainNameLongest-12                 256B ± 0%      256B ± 0%     ~     (all equal)
UnpackDomainNameLongestUnprintable-12    1.02kB ± 0%    1.02kB ± 0%     ~     (all equal)
UnpackDomainNamePointers-12                120B ± 0%       48B ± 0%  -60.00%  (p=0.000 n=10+10)

name                                   old allocs/op  new allocs/op  delta
UnpackDomainName-12                        1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackDomainNameUnprintable-12             1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackDomainNameLongest-12                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackDomainNameLongestUnprintable-12      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackDomainNamePointers-12                4.00 ± 0%      1.00 ± 0%  -75.00%  (p=0.000 n=10+10)
We keep some of the existing weird behaviour (mostly of not erroring
when we should), but refactor it so it's far more obvious and things
are cleaner.

We leave a number of TODOs to address some of the weirder choices. They
would also allow for some other improvements to the code if we can make
those changes.
This was to maintain existing undocumented behaviour, but this makes
more sense.
This logic in this test case was wrong which caused strange behaviour
when re-working the domain name unpacking code.
This is the more correct error to be returning.
Actually no matter what I seem to do this test is terribly broken. I
can't be bothered wrestling with it any longer. Goodbye.
Now that TestTruncatedMsg is gone, we can implement the sane behaviour
here.
In some places this results in more readable code, in other places less.
What we gain though is the same number of allocations that we started
with. In other words, we haven't made things worse!

Without this change, every time we pass *cryptobyte.String into an
interface it is forced onto the heap and we get a nasty allocation.

It does require us to check that all the data has been consumed at
the end of each `unpack([]byte) error` call instead of having a
single check. Ultimately this isn't too bad.

We could change unpack to return whatever is left over. We don't
do this for semantic reasons. unpack is required to consume the
entirety of data (as we know it's length ahead of time). Asking
it to return whatever is left for the sole purpose of checking
it's nothing is silly. In some places we pass a pointer or do
return the rest. In these cases, it's because we don't know
ahead of time how long something will be.
These never actually did anything as dh is a struct and we don't consult
it again.
While it's nice having this as a single function rather than one for
Question and one for RR, using generics and passing the unpack function
causes the *cryptobyte.String to escape. That leaves us with some
rather ugly workarounds. Instead just bite the bullet and write it
twice. If we chose to remove the msg.Empty() check in the future, then
this doesn't end up too bad at all.
This helper validates that the offset can be calculated correctly.

We have to restrict msgBuf in unpackRRWithHeader as previously we would
sometimes computing invalid offsets in unpackDomainName. This helper is
designed to ensure we don't make this mistake again.
We need to compare against the start of the pointer not the end of it.

While we're here, improve TestUnpackDomainName and ensure it's testing
UnpackDomainName properly.
This isn't needed now that we always check s.Empty().
Returning &h (*RR_Header) as an RR causes h to spill to the heap and
allocate. Given this is only needed for a specific error and not in the
happy path, we instead create a local copy of h and return a pointer to
that instead. That eliminates the allocation on success.

While we're here, remove the new(RR_Header) return in UnpackRR. This
doesn't make any sense as it contains no information. We already return
nil in the error case above.

It shows a nice improvement for several of the benchmarks.

name                                   old time/op    new time/op    delta
UnpackDomainName-12                       153ns ±14%     148ns ± 6%     ~     (p=0.268 n=10+9)
UnpackDomainNameUnprintable-12            124ns ± 4%     123ns ± 6%     ~     (p=0.591 n=10+10)
UnpackDomainNameLongest-12                581ns ±12%     561ns ±15%     ~     (p=0.190 n=10+10)
UnpackDomainNameLongestUnprintable-12    1.76µs ±24%    1.66µs ±20%     ~     (p=0.393 n=10+10)
UnpackDomainNamePointers-12               146ns ± 5%     146ns ± 3%     ~     (p=0.730 n=9+9)
UnpackA-12                                194ns ±15%     135ns ±12%  -30.14%  (p=0.000 n=10+10)
UnpackMX-12                               200ns ±15%     140ns ±10%  -29.96%  (p=0.000 n=10+10)
UnpackAAAA-12                             199ns ±15%     153ns ±14%  -22.88%  (p=0.000 n=10+10)
UnpackMsg-12                             1.48µs ±16%    1.26µs ±13%  -14.68%  (p=0.001 n=10+10)
UnpackString/Escaped-12                  84.8ns ± 7%    79.6ns ± 6%   -6.12%  (p=0.003 n=9+10)
UnpackString/Unescaped-12                46.5ns ±15%    47.8ns ±14%     ~     (p=0.353 n=10+10)
Serve-12                                 58.3µs ± 2%    57.8µs ± 3%     ~     (p=0.065 n=9+10)
Serve6-12                                61.4µs ±10%    61.1µs ± 8%     ~     (p=0.796 n=10+10)
ServeCompress-12                         59.1µs ± 2%    58.6µs ± 2%     ~     (p=0.113 n=9+10)

name                                   old alloc/op   new alloc/op   delta
UnpackDomainName-12                       64.0B ± 0%     64.0B ± 0%     ~     (all equal)
UnpackDomainNameUnprintable-12            48.0B ± 0%     48.0B ± 0%     ~     (all equal)
UnpackDomainNameLongest-12                 256B ± 0%      256B ± 0%     ~     (all equal)
UnpackDomainNameLongestUnprintable-12    1.02kB ± 0%    1.02kB ± 0%     ~     (all equal)
UnpackDomainNamePointers-12               48.0B ± 0%     48.0B ± 0%     ~     (all equal)
UnpackA-12                                 100B ± 0%       68B ± 0%  -32.00%  (p=0.000 n=10+10)
UnpackMX-12                                100B ± 0%       68B ± 0%  -32.00%  (p=0.000 n=10+10)
UnpackAAAA-12                              112B ± 0%       80B ± 0%  -28.57%  (p=0.000 n=10+10)
UnpackMsg-12                               584B ± 0%      520B ± 0%  -10.96%  (p=0.000 n=10+10)
UnpackString/Escaped-12                   32.0B ± 0%     32.0B ± 0%     ~     (all equal)
UnpackString/Unescaped-12                 24.0B ± 0%     24.0B ± 0%     ~     (all equal)
Serve-12                                 3.53kB ± 0%    3.50kB ± 0%   -0.91%  (p=0.000 n=10+10)
Serve6-12                                3.37kB ± 0%    3.34kB ± 0%   -0.95%  (p=0.000 n=10+8)
ServeCompress-12                         3.74kB ± 0%    3.71kB ± 0%   -0.87%  (p=0.000 n=10+10)

name                                   old allocs/op  new allocs/op  delta
UnpackDomainName-12                        1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackDomainNameUnprintable-12             1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackDomainNameLongest-12                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackDomainNameLongestUnprintable-12      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackDomainNamePointers-12                1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackA-12                                 3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)
UnpackMX-12                                3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)
UnpackAAAA-12                              3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)
UnpackMsg-12                               12.0 ± 0%      10.0 ± 0%  -16.67%  (p=0.000 n=10+10)
UnpackString/Escaped-12                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
UnpackString/Unescaped-12                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Serve-12                                   55.0 ± 0%      54.0 ± 0%   -1.82%  (p=0.000 n=10+10)
Serve6-12                                  52.0 ± 0%      51.0 ± 0%   -1.92%  (p=0.000 n=10+10)
ServeCompress-12                           57.0 ± 0%      56.0 ± 0%   -1.75%  (p=0.000 n=10+10)
…message

Semantically this makes more sense. We also rename a few variables here
and there to be more consistent.
Also remove some pointless `err != nil` checks.
svcb_test.go Outdated Show resolved Hide resolved
svcb.go Outdated Show resolved Hide resolved
svcb.go Outdated Show resolved Hide resolved
Copy link
Owner

@miekg miekg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to run the coredns testsuite with this version.

@tmthrgd
Copy link
Collaborator Author

tmthrgd commented Nov 8, 2023

Would be good to run the coredns testsuite with this version.

Yep that's actually my hold up at the moment. I've been debugging and working through a few scattered CoreDNS test failures.

This fixes a CoreDNS test failure
(github.com/coredns/coredns/plugin/pkg/proxy.TestCoreDNSOverflow) that
was testing behaviour that is sensitive to the specific error message
returned when unpacking overflowing messages or that it is ErrBuf.

It also ensures we correctly distinguish between invalid record data
and invalid message framing. Ideally the `unpack(...) error` methods
(on RR, EDNS0 and SVCBKeyValue) would actually be `unpack(...) bool`
and we'd allow ourselves more freedom on how we report unpacking
failures.
In a small number of instances we lose some error information with this
change. I don't think it's particularly valuable information and this
approach is far simpler, far less vague about what errors are
appropriate and match how crypto/tls uses cryptobyte.String to unmarshal
handshake messages.
This code only ever returns a single error value.
@tmthrgd
Copy link
Collaborator Author

tmthrgd commented Nov 8, 2023

regarding this errors: I think the est. pattern in this lib is: "bad.. ", with maybe slightly more context than just 'bad SVCB'

Yeah I've gone back and forth on the errors and ended up with something quite different. There may be compatibility concerns, but ultimately I think we need to be far more semantic with our unpacking errors. I've gone and distinguished between failures to unpack the message framing and failures to unpack the record data specifically. Once we're in a length-delimited section, it doesn't make sense to use the same error anymore as we're not dealing with potential truncation but rather just corrupt marshalling.

Edit: The error changes very much could be backward incompatible. It's hard to tell exactly. If so, I think we can work around that while still mostly keeping the semantic distinctions.

@tmthrgd
Copy link
Collaborator Author

tmthrgd commented Nov 8, 2023

I've left quite a few TODOs and comments where I've discovered incorrect or problematic behaviour. I've successfully run the CoreDNS test suite against this change now as well. Figuring out why it was failing was quite difficult and led down the RFC rabbit hole and exposed some bugs in this library. I'll aim to address all the TODOs in future PRs.

client.go Show resolved Hide resolved
edns.go Outdated Show resolved Hide resolved
dns_test.go Show resolved Hide resolved
msg.go Show resolved Hide resolved
// errors.Is(_, ErrBuf) or contains the substring "overflow".
//
// TODO(tmthrgd): Update CoreDNS and use "truncated message".
errTruncatedMessage = &Error{err: "overflow unpacking truncated message"}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a properly truncated message should unpack just fine.
unsure why TC bit messages deserve special treatment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't for a properly framed messaged with or without the TC bit set, this is for a message that is say randomly cut in half. We return this if we're say unpacking a resource, but the message buffer stops half way through the RDATA. This already fails, but with a less obvious error message. I'm open to suggestions on a different message.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok. Essentially ErrBuf, but... maybe use partial as truncated as meaning in the protocol?
ErrPartialMessage = &Error{err: "overflow unpacking partial message"} (unsure about the word "overflow" here.

msg.go Show resolved Hide resolved
msg.go Outdated Show resolved Hide resolved
//
// TODO(tmthrgd): This check is no longer required to ensure
// loop free behaviour. Should we remove it?
if ptrs++; ptrs > maxCompressionPointers {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically there could still be a lot of pointers backwards? maxCompressionsPointers now has a value of 126; that could still be reach, although not via a loop.

Think it can go, even with a mesg with a lot of pointers backwards, we should not introduce an arbitrary limit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a weird one. Pointing to another pointer would never be generated by a sensible non-buggy implementation. It doesn't even save you a byte. There's annoyingly no RFC that prohibits it though, so it is very much a judgement call. The backward check prevents a loop so we can remove this, it just opens the door to being forced to do more work by an attacker. I can remove it in a follow up PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer to self is then still checked?

msg_test.go Show resolved Hide resolved
@miekg
Copy link
Owner

miekg commented Nov 25, 2023

think this could go in and the fix some fallout and the dyn. update stuff?

(pointer to self should still be checked I guess, but can also be done in followup)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants