Parsing TXT records does not properly de-escape quotes #1384

TomOnTime · 2022-06-20T15:18:08Z

When calling NewRR() on a TXT record, an escaped quote isn't properly de-escaped.

A line such as:

example.com. IN TXT "inner\"quote"

It gets parsed to: inner\"quote
I was expecting: inner"quote

Maybe I'm misreading the RFC? It just seems that without removing the \ there's no way to know what the \ means in the resulting string.

I've created #1383 to demonstrate the issue.

The text was updated successfully, but these errors were encountered:

miekg · 2022-06-20T15:21:25Z

[ Quoting ***@***.***> in "[miekg/dns] Parsing TXT records doe..." ]

When calling NewRR() on a TXT record, an escaped quote isn't properly de-escaped. A line such as: example.com. IN TXT "inner\"quote" It gets parsed to: inner\"quote I was expecting: inner"quote

but if you put that back in the NewRR you lost the fact that it was quoted... We've have multiple discussions about this throughout the years, mostly because the original RFC text sucks - I think the current behavior is correct

tlimoncelli · 2022-06-20T19:02:31Z

"you lost the fact that it was quoted" -- I would imagine that the fact that it is ASCII 34 means it was quoted.

Can the string be formed any other way?

miekg · 2022-07-07T10:23:25Z

trouble parsing what you mean here. I still believe current behavior is correct, as calling this should give the same output:

NewRR -> String() -> NewRR on that string -> String() etc.

TomOnTime · 2022-07-07T17:44:39Z

Sorry for being unclear.

Question: Should newRR() simply remove the outer quotes (if they exist) or should it de-escape interior double-quotes too?

I expected it to do the latter. I wrote a TestTXTEscapeJustParse which only does the first half of what TestTXTEscapeParsing does. That way I can see the intermediate result.

It looks like NewRR() doesn't de-escape interior quotes. (i.e. replace \" with ").

Input to NewRR: "escaped\"quote"
Expected output of result.String(): escaped"quote
Actual output of result.String(): escaped\"quote

When we round-trip the strings by calling sprintTxt(), we get the original input, because it ignores the \.

If you look at https://github.com/miekg/dns/pull/1383/files, you'll see the a 2 tests, one should fail and the other should pass.

$ go test ./...
--- FAIL: TestIsPacketConn (0.00s)
    client_test.go:74: unable to run test server: listen unixpacket /var/folders/j8/8pxbmmjs6xjbd4nxjpd_q0rm0000gp/T/TestIsPacketConn2401358967/002/unixpacket.sock: socket: protocol not supported
--- FAIL: TestTXTEscapeJustParse (0.00s)
    parse_test.go:163: mismatch after parsing `"escaped\"quote"` TXT record: `escaped\"quote` != `escaped"quote`
FAIL
FAIL	github.com/miekg/dns	0.874s
ok  	github.com/miekg/dns/dnsutil	0.130s
FAIL

If no de-escaping is intended, then there's no bug.

Tom

janik-cloudflare · 2024-02-15T12:37:30Z

We've just run into a related issue. In this comment, X represents a string containing the character a 253 times. Then:

a TXT with zone file content "X0123456789" parses into slice X01 + 23456789 (as expected)
a TXT with zone file content "X\;0123456789" parses into slice X\; + 0123456789 (this seems like a bug: the backslash and semicolon are each counted as a byte, and the string split earlier than necessary, even though this escape sequence would only produce a single byte in wire format)

The String() function removes unnecessary backslashes, so parsing and then calling String() on something like "X\;a" "test" turns it into "X;" "a" "test", adding an unnecessary string. In my opinion, the way to go would be to remove unnecessary backslashes immediately upon parsing instead of in String(). This seems more logical and consistent to me: a string in a parsed dns.TXT would then really only contain the contents of that string, without additional zone file syntax. AAAA addresses, for example, are also stored in canonical form (e.g., unnecessary explicit zeros are removed during parsing) so there should be no need to keep literal backslashes in TXT records, right?

(For simplicity, all strings in this comment are raw/literal strings, without Go syntax. In other words, every backslash above is a literal backslash and every quote is a literal quote, such as a quote in a zone file.)

Edit: I've opened #1540 to fix the immediate issue, though I still think it'd be smart not to store escape sequences in parsed data structures (just like JSON escape sequences in strings aren't preserved when using Unmarshal, etc.), even though that's technically a breaking change.

TomOnTime mentioned this issue Jun 20, 2022

NewRR does not remove escape char #1383

Open

janik-cloudflare mentioned this issue Jan 17, 2024

Trailing backslash results in non-FQDN targets #1528

Closed

janik-cloudflare mentioned this issue Mar 1, 2024

Fix counting of escape sequences when splitting TXT strings #1540

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing TXT records does not properly de-escape quotes #1384

Parsing TXT records does not properly de-escape quotes #1384

TomOnTime commented Jun 20, 2022

miekg commented Jun 20, 2022 via email

tlimoncelli commented Jun 20, 2022

miekg commented Jul 7, 2022

TomOnTime commented Jul 7, 2022

janik-cloudflare commented Feb 15, 2024 •

edited

Parsing TXT records does not properly de-escape quotes #1384

Parsing TXT records does not properly de-escape quotes #1384

Comments

TomOnTime commented Jun 20, 2022

miekg commented Jun 20, 2022 via email

tlimoncelli commented Jun 20, 2022

miekg commented Jul 7, 2022

TomOnTime commented Jul 7, 2022

janik-cloudflare commented Feb 15, 2024 • edited

janik-cloudflare commented Feb 15, 2024 •

edited