Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTLS support #2010

Closed
njsmith opened this issue May 15, 2021 · 6 comments
Closed

DTLS support #2010

njsmith opened this issue May 15, 2021 · 6 comments

Comments

@njsmith
Copy link
Member

njsmith commented May 15, 2021

DTLS is a variant of TLS that's used for encrypting communication over UDP. It's used for applications like VPNs and VoIP, where you don't want TLS retransmits and scheduler messing with your packets, but still need confidentiality and integrity. Like I mentioned in gitter the other day, a company that would prefer to remain anonymous has hired me to implement DTLS for Trio. It might end up part of Trio itself (like trio.SSLStream already is), or a separate project – that's to be determined. But I figured I'd start making notes on it here either way.

DTLS info

DTLS is defined by a diff against the TLS spec.

  • Latest version is DTLS 1.2, defined in RFC 6347
  • It's based on TLS 1.2, defined in RFC 5246
  • There's a recent followup RFC about connection tracking that's pretty relevant to the code we'll need to handle on our end, that's finished but hasn't yet been assigned an official number: (Draft, Draft status tracker)
  • And the RFC for DTLS 1.3 is also finished and waiting for an RFC number, and AFAICT there aren't any implementations yet, so we'll be ignoring that for now. (Draft, Draft status tracker, OpenSSL implementation tracking issue)

OpenSSL supports DTLS, but really as an afterthought, basically by wedging it into a TLS-shaped box. There's a ton of complicated glue required to hook it up to something like a Trio socket.

Multiplexing

A DTLS socket, like all UDP sockets, can handle lots of peers simultaneously, and act as both a client and a server to different peers. OpenSSL assumes that each transport has a single peer. So it's the user's job to figure out which packet belongs to which OpenSSL connection, and route them appropriately.

Solution: handle the actual socket I/O ourselves. When a packet comes in, use the source address to look up the appropriate OpenSSL connection object, and pass it in.

When draft-ietf-tls-dtls-connection-id is finalized, that will add a different multiplexing key (basically the idea is to let hosts roam without having to set up a new DTLS connection whenever their IP changes), but I guess we can worry about that later. Also, this will require changes upstream in OpenSSL, because the connection-id negotiation happens as part of the actual handshake. (After the handshake, the multiplexing itself is easy and we can handle it by peeking into the packet headers to read out their connection id. What we need from OpenSSL is a way to say "please include the connection_id extension in your handshake with the id XXX", and maybe a way to query the result of the connection_id negotiation after the handshake finishes.)

Packets vs streams

A DTLS socket is packet-based. OpenSSL uses a pluggable transport layer called "BIO"s, and they have a concept of a "packet BIO", but there's no built-in "memory packet BIO". So we can either implement our own BIO, or use some hacks to make the existing memory BIO work.

Solution: Making memory BIOs work for regular read/write calls is easy, because each read/write corresponds to a single packet. For handshakes, it's trickier, because a single handshake "volley" might include multiple packets, which OpenSSL will happily concatenate into the memory BIO's output buffer. Fortunately it's not too hard to parse the length headers on the TLS records to figure out where the packet boundaries should be. But... see "Path MTU issues" below for more complications.

Handshake retransmits

For regular data packets, DTLS has the same semantics as UDP: if the packet gets lost, then on well, too bad. But that doesn't work for handshake packets -- those have to arrive successfully, or nothing else works. So DTLS uses a timeout-based mechanism where if one side notices that the handshake hasn't been progressing, then it resends its last set of packets.

OpenSSL has some support for this built in. But! It's hard-coded to use the system clock (among other bits of awkwardness). And we want to use the Trio clock, to make autojump_clock still work. So, I think we'll probably want to handle the retransmits ourselves. It's not too complicated: in theory you should unwrap each handshake "record" and put it into a new record when you retransmit, and increment the record id. But, I'm not sure this actually matters, because if you send a duplicate packet, then the peer should tolerate that anyway. And if we do it "properly", it should be fine, because OpenSSL doesn't actually need to know the record ids: during the handshake, we can generate them ourselves, and after the handshake, it's a new "epoch" and the record ids get reset.

The DTLS 1.3 draft has some detailed guidance for retransmit timing: https://tlswg.org/dtls13-spec/draft-ietf-tls-dtls13.html#section-5.8.2

Path MTU issues

The "path MTU" is the maximum size packet you can send to a particular destination without some router dropping it along the way. (It's a "path" MTU because packets to different destinations will pass through different routers, which might have different limits.) For example, the standard Ethernet MTU is 1500 bytes. So you normally can't send a UDP packet with 1600 bytes in it -- or, well, you can, but it will be instantly discarded. In fact, you can't even send a UDP packet with 1500 bytes in it, because some of that gets used for overhead:

  • IPv4 header: usually 20 bytes; can be larger
  • IPv6 header: usually 40 bytes; can be larger
  • UDP header: 8 bytes

Plus, who knows, at some point your packets might pass through some kind of encapsulation like wireguard or 6to4, which will also need to reserve some of the lower-layer's MTU for their own usage.

Watch out: these headers also make discussing MTU complicated, because different ways of calculating the MTU might or might not include them. For example, most OSes offer ways to query what they think the MTU is for a given destination, and they'll give you the link-layer MTU, not the actual maximum UDP payload size that you probably care about. You have to subtract off 28 or 48 bytes depending on whether you're using IPv4 or IPv6.

Why do we care about any of this for DTLS? Two reasons:

First, it's mostly the user's responsibility to make sure they don't try to send packets that are too big, in both regular UDP and DTLS. But, DTLS makes this a bit harder, because the user might know what the MTU is for regular UDP packets, and they know how much unencrypted data they want to send... but they don't necessarily know how much space that data will need after it's encrypted. Fortunately, OpenSSL provides an API to find this out: DTLS_get_data_mtu. Unfortunately, this API requires that you somehow tell OpenSSL what the underlying transport MTU is, so it can subtract off the DTLS overhead.

There are two APIs for this: DTLS_set_link_mtu, and SSL_set_mtu. The former is supposed to be passed the link-layer MTU (e.g. 1500 for ethernet), and then it queries the BIO to ask what the header overhead is for this particular socket. Of course, since we'll be using memory BIOs, this doesn't work. OTOH, SSL_set_mtu is passed the MTU after this overhead is accounted for (e.g. 1500-28=1472 for UDP over IPv4 over ethernet). So that's what we want. HOWEVER, at the end of the handshake, OpenSSL normally discards whatever you passed to SSL_set_mtu and then tries to query the BIO for it. To avoid this, you have to set SSL_OP_NO_QUERY_MTU. (No, none of this is documented, why do you ask?)

Anyway, bottom line: we want to unconditionally set SSL_OP_NO_QUERY_MTU, we want to feed in our MTU estimates via SSL_set_mtu [or maybe let the user feed in their own MTU estimates?], and we want to expose the value from DTLS_get_data_mtu.

OKAY. The other reason we need to know about MTUs is for the handshake. Handshake messages can potentially be really big, like tens of kilobytes, because certificate chains can be really big. Obviously if you try to stuff that into a single packet, then all your handshakes will fail and nothing will work at all. So DTLS has a mechanism to split a single handshake message up into multiple packets.

Now, what makes this tricky is that it interacts with retransmits. Remember how I said above that if handshake packets get lost, we have to handle our own retransmits? Well, one of the reasons they could get lost is that we're sending packets that are too big. So if our packets keep getting lost, we have to notice that and re-fragment the handshake message into new, smaller fragments.

Fortunately the fragmentation header fields are pretty simple: there's a single underlying handshake message you're trying to send, which we can read out from the packets that openssl generates, we split it up into whatever pieces we want, and then we slap on headers saying "these are bytes 0-1999 of the handshake message", "these are bytes 1000-1999 of the handshake message", etc. So it's all doable, though it requires writing an actual DTLS handshake record parser, which is unfortunate.

Handshake challenges to prevent spoofing

For regular TLS, the TCP/kernel are responsible for ensuring that the peer is reachable at the address they claim it is, and no-one is trying to spoof us. For UDP, the protocol has to do this by hand, and to handle this DTLS bolts on an extra handshake step before the regular TLS handshake. The key idea is that this extra handshake is a stateless challenge/response: the first time a client tries to connect, we send back some unpredictable bytes, and then forget about them. Then they try again, passing back those bytes we sent, which proves that they received our previous packet and are actually reachable. And we can use some crypto to check whether they're the same bytes we sent before, even though we didn't make any record of which bytes we sent.

Specifically: the DTLS ClientHello message (which is always the first message sent on a new DTLS connection) has an extra "cookie" field added. Initially, this is set to the empty string. Then the server sends back a HelloVerifyRequest with the magic cookie in it, and the client re-sends an identical ClientHello except with the magic cookie added.

Since it's not part of the regular TLS state machine, OpenSSL mostly leaves this up to the user to take care of. There's a function called DTLSv1_listen that processes incoming packets and sends back HelloVerifyRequests until it sees a valid cookie. It seems a bit awkward to use, because you have to create a dedicated SSL object that does DTLSv1_listen, your multiplexer has to route only ClientHellos to it, and then as soon as it returns successfully it's "transmuted" into the SSL object for that particular handshake, so you need to know which peer sent the winning packet, move this SSL object to handle that peer, and create a new SSL object for calling DTLSv1_listen.

Also, DTLSv1_listen doesn't really... do anything. The hard part of HelloRetryRequest handling is generating and validating the cookies. DTLSv1_listen doesn't do that. You have to implement those from scratch, and then pass your implementations to OpenSSL as special callback functions. All DTLSv1_listen does is parse the packets and call your callbacks.

Also, OpenSSL doesn't even pass the callbacks all the info we need; in particular, we have to bind the cookie to a particular peer, but OpenSSL doesn't know who our peer is, because we're using memory BIOs. So we'd need to pass it through like a thread-local variable or something gross like that.

I suspect it'll easier to parse the packets and generate the HelloVerifyRequests ourselves than to mess with DTLSv1_listen.

As far as the actual cookie generation/validation goes, you want something like MAC(secret_key, glom(peer_address, ClientHello_contents, current_time)), so that eavesdroppers can't steal your cookie and re-use it for a different peer or at a different time.

You can embed the exact time inside the cookie and then read it out again when verifying, but that reveals your server's idea of "the current time" to any eavesdroppers, which is probably harmless but why risk it. Instead I think we'll like, truncate "current time" to the nearest 30 seconds, and then at validation we'll try the current time and the previous time.

Also remember that glom needs to be bijective.

Required OpenSSL APIs

Reviewing the above, I think the only things we need that the ssl/pyopenssl modules don't already offer are:

  • DTLS_method (plus maybe the client/server versions)
  • SSL_set_mtu
  • DTLS_get_data_mtu
  • SSL_OP_NO_QUERY_MTU

For the stdlib ssl, these would be easy to add but wouldn't ship until python 3.11 at the earliest, which is more than a year away. So we'll focus on pyopenssl first.

For pyopenssl, I think the procedure is to add them to pyca/cryptography first, then to pyopenssl. Fortunately these are all pretty trivial (two magic constants + an integer getter/setter), so it's should be easy to do.

A niggling annoyance if we put this in Trio proper: our TLS support would be stdlib-only and our DTLS support would be pyopenssl-only. But I guess that's not too terrible, and we can later extend them both to work with both. And we don't even need to declare a package dependency on pyopenssl, since the user will need to import it themselves before they can get a pyopenssl context to pass to us, so we can delay the import until then.

Implementation outline

User-level API:

Some kind of DTLSSocket object where you pass in an SSLContext + UDP socket, and it holds:

  • the SSLContext
  • the SOCK_DGRAM socket
  • a dict mapping peer address -> openssl SSL object for established peerings
  • something to track handshakes-in-progress

You interact with it by iterating to get (source address, packet) pairs, and can send (destination address, packet) pairs, which trigger a handshake as needed.

Probably also need some configuration knobs for handshake retransmit timing, time to discard old unused peerings, etc., and ways to introspect the current peerings (e.g. fetch certificate for a specific peer).

Alternative approach: I considered having separate user-level objects for each peer connection that you use to send/receive to that peer, and also manage stuff like handshakes and querying information about that peer. And then you'd have a central dispatcher object that manages the socket and call like 'connect' and accept' on to create connection objects. But the problem with this is that since all the connections would be sharing a single kernel-level receive queue, there's no way to implement individual_connection.receive() without potentially having to buffer an unbounded number of packets from other connections, or else start silently dropping them or something. So exposing the single underlying queue as a single queue seems like it will have less weird effects in the end.

If you really want a socket with only a single peer, you can make a connected UDP socket, and then the kernel will automatically make sure you only receive packets from that peer.

[question: for SSLStream, a major feature is that it can run over arbitrary stream-like transports, not just TCP. Do we want something similar for DTLS? it would be much more complicated, because DTLS has a much more intimate relationship with the transport: it needs to know about IPv4/IPv6 addressing at a minimum, and perhaps also stuff like OS-specific magical sockopts for PMTU querying]

[question: how do we let the user manage their set of associated peers? SSLContext can set some cert validation flags, but you might also want to do things like "don't even start a handshake with IP address X, we don't want to talk to them so it will just waste resources", "what cert did peer Y use?", "please forget about peer Z, we're done talking and I want to free up the memory", etc.]

Each object also has:

  • One hidden background task to handle reading, with a zero-buffer channel to send data packets to the user
  • One hidden background task for each handshake-in-progress, each with its own zero-buffer channel to receive packets from the reader

I think these background tasks can be full-blown system tasks, following the same logic as the [potential heresy] post.

Reader logic: for each incoming packet:

  • if it's an initial ClientHello with no cookie, then generate a challenge and send it back.
  • if it's a ClientHello with cookie, then verify the cookie and then start a background task to handle this handshake
  • if it's any other handshake packet, find the corresponding handshake task and send it the packet. (If there is no corresponding handshake task, drop the packet.)
  • otherwise, find the corresponding established connection and decrypt the packet. if it fails then drop the packet. Otherwise, send it on to the user.

Handshake tasks:

  • for each incoming packet, pass to openssl
  • if openssl returns new data, then that packet completed a volley, so we want to send our new volley back. discard our last volley, and fragment the new volley as appropriate.
  • set a timeout, and if the timeout expires without a new packet arriving, refragment and resend
  • once the handshake succeeds, move the connection into the "active connections" table (replacing anything that's there)
  • if the handshake fails, discard it

I guess we'll also want some way for users to debug when a handshake fails? tbh openssl doesn't provide much useful guidance here even in the best case, so maybe there's no point. And there's no easy way for one of these background tasks to report a problem, at least for server handshakes. but maybe we'll want to... log something? idk, logging in a low-library is pretty fraught.

TODO

Figure out the transmit side (especially connecting)

Figure out how DTLS 1.3 will break all this, and maybe ask OpenSSL folks to keep us in mind when implementing it. (At least: HelloRetryRequest is now part of handshake transcript, so it must be generated by openssl. Also, messages that need retry loops can happen at any time, not just during the handshake, and you can't tell whether they've been ACKed or not except by passing them through the full state machine. And I'm not sure how refragmenting works. And the connection-id extension is built in, and the connection-ids are encrypted. Maybe other stuff too.)

@lewoudar
Copy link
Contributor

lewoudar commented May 16, 2021

Just out of curiosity, what does memory BIO looks like? I mean, I don't understand even the concept
Also

Fortunately, OpenSSL provides an API to find this out: DTLS_get_data_mtu. Unfortunately, this API requires that you somehow tell OpenSSL what the underlying transport MTU is, so it can subtract off the DTLS overhead.

Why OpenSSL needs to know the underlying transport MTU? DTLS_get_data_mtu does not only return the length of the encrypted data? Or I guess the DTLS overhead does not have a fixed value...

@njsmith
Copy link
Member Author

njsmith commented May 16, 2021

Just out of curiosity, what does memory BIO looks like? I mean, I don't understand even the concept

It's basically the OpenSSL equivalent of Python's BytesIO. A thing that pretends to be a "socket", but it actually just reads/writes to a buffer in memory. So we let OpenSSL write data to the buffer, then we take the data out of the buffer and send it through the real socket, and vice-versa for receiving.

Why OpenSSL needs to know the underlying transport MTU? DTLS_get_data_mtu does not only return the length of the encrypted data? Or I guess the DTLS overhead does not have a fixed value...

Good question! Partly it's just "that's how openssl did it", so we're kind of stuck with it regardless...

Huh, looking at the code though, there actually is a reason: lots of TLS ciphers (maybe all of them?) have a fixed block size, and all ciphertexts get padded so they end up as a multiple of that block size. So if your block size is 16 bytes, and your MTU is 31, then you can only send 16 bytes, because the remaining 15 bytes isn't enough for an entire block. But if your MTU is 32, then you can send 32 bytes, because now two whole blocks fit. So the overhead isn't just a single fixed number -- it varies depending on how close the MTU is to being a multiple of the block size.

@lewoudar
Copy link
Contributor

thanks for the explanation!

@njsmith
Copy link
Member Author

njsmith commented May 19, 2021

Some interesting challenges I just discovered with certificate validation here:

  • openssl now has built-in hostname checking, but pyopenssl still doesn't expose it. So I might want to add that too? (No hostname check pyca/pyopenssl#795 says otherwise, but I just verified with @reaperhulk on IRC that this is incorrect -- the problem is that pyopenssl doesn't expose the APIs for setting what the expected hostname should be [edit: filed an issue: Still need to support hostname verification pyca/pyopenssl#1020]

  • For DTLS, where a single socket can have multiple TLS connections as both client and server, we might want to apply slightly different validation depending on who we're talking to. For example, the "expected hostname" is something the client usually sets in their SSLContext. But that means that if you use a single socket to talk to multiple servers, then you need multiple SSLContexts that differ only in which host they expect? That gets messy fast.

    I guess we could instead ask the user to pass a context factory function, where we pass in the hostname, and it returns an appropriate SSLContext, for each connection?

Hostname validation is also a bit weird in this context, b/c generally for a UDP connection you don't want to redo the hostname lookup on every packet, so you'd want to resolve to an IP address then specify that on every packet. But for hostname validation, you need the actual expected hostname. I guess we could make the client API something like:

peer_token = await dtls_socket.connect(hostname, port)  # or this could take a SSLContext?
await dtls_socket.sendto(peer_token, packet)

I.e., force the user to explicitly connect to each peer before they can even attempt to send a packet to them. (Well, maybe connect isn't the best name, because this is different from a UDP socket connect where you're declaring that you want to only use this socket with that one peer. But something like this.)

I was hoping that we could pretend the dtls_socket was mostly stateless at the API level, where you can just send/receive packets with arbitrary peers and the lower-level takes care of doing handshakes when appropriate. This is especially useful if e.g. one of the sides loses the handshake information (since UDP is stateless, it's plausible to do stuff like discard tls state for any peer you haven't heard from in the last 30 minutes, and just figure that if they want to talk to you again they'll do another handshake -- and you generally have to be prepared to redo handshakes at any time, since peers can reboot or whatever without telling you). Which would mean that even if a particular DTLS association starts out with one side being the client, then after a while it might silently flip around so the other side is the client (because they happen to be the first one to send a packet after the original client lost their association state). But if clients and servers have fundamentally different auth APIs, then this kind of transparency isn't going to work.

@njsmith
Copy link
Member Author

njsmith commented May 20, 2021

A few more discoveries:

Spurred by the issues with handshakes/auth, I reconsidered whether it would make sense to model a DTLS socket as a bunch of individual connections, using a separate connected socket for each. On Unix-likes this would work -- you can have a generic UDP socket bound to a local port to accept incoming handshakes, while also having multiple UDP sockets that are bound to the same local port but connected to different remote ip/port pairs, and the kernel automatically routes incoming UDP packets to the best-matching socket. But! Windows doesn't have anything like this, it just gives packets to whichever socket was opened first, so, never mind, this won't work: https://stackoverflow.com/questions/59779711/problems-using-udp-sockets-bound-to-the-same-port-on-windows

Re: OpenSSL's handshake timeout/retransmit mechanism: as described above, we don't want to use this, because it's hardcoded to use the system clock. Previously I thought we could handle that pretty easily, by just never calling DTLSv1_handle_timeout. But, it turns out that OpenSSL will call this internally at various points and probably confuse things. So we either need to detect OpenSSL's retransmits and suppress them, or else use DTLS_set_timer_cb to override the default timeout values, and set them to like, MAX_INT. [Edit: But DTLS_set_timer_cb is openssl 1.1.1+ only, so I'm not sure if we can rely on it.]

@gesslerpd
Copy link
Member

#2047 is now merged! This closed issue can still be referenced for more information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants