Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to connect to server with Let's Encrypt certificate #5

Open
jcoglan opened this issue Nov 29, 2021 · 25 comments
Open

Failure to connect to server with Let's Encrypt certificate #5

jcoglan opened this issue Nov 29, 2021 · 25 comments

Comments

@jcoglan
Copy link

jcoglan commented Nov 29, 2021

Hi there 👋 This might be weird issue, I'm opening it because I use the SSL verification code from this library in faye-websocket and just discovered an issue with it.

The Faye website is https://faye.jcoglan.com/ and its TLS certificates are provided by Let's Encrypt's certbot. I found that the verification code rejected it even though my browsers think the site is fine, so I wanted to run this past someone else familiar with this code to see what you think.

If you run this code, you get a TLS failure:

require 'bundler/setup'
require 'faraday'

connection = Faraday.new('https://faye.jcoglan.com') do |conn|
  conn.adapter(:em_http)
end

connection.get('/')

To debug this, I made some adjustments to my copy of the TLS verification code and ended up with this demo that connects successfully:

require 'bundler/setup'
require 'eventmachine'
require 'openssl'

HOST = 'faye.jcoglan.com'

module Connection
  def connection_completed
    @cert_store  = OpenSSL::X509::Store.new
    @last_cert   = nil
    @last_verify = false

    @cert_store.set_default_paths

    start_tls(sni_hostname: HOST, verify_peer: true)
  end

  def ssl_verify_peer(cert_text)
    certificate = OpenSSL::X509::Certificate.new(cert_text)

    @last_cert   = certificate
    @last_verify = @cert_store.verify(certificate)

    p certificate

    if @last_verify
      store_cert(certificate)
      puts "\e[37;42m VERIFIED \e[0m"
    else
      puts "\e[37;41m UNVERIFIED \e[0m"
    end

    true
  end

  def ssl_handshake_completed
    unless @last_verify and OpenSSL::SSL.verify_certificate_identity(@last_cert, HOST)
      raise OpenSSL::SSL::SSLError,
            %(host "#{HOST}" does not match the server certificate)
    end
  end

  def store_cert(cert)
    @cert_store.add_cert(cert)
  rescue OpenSSL::X509::StoreError => error
    raise error unless error.message == 'cert already in hash table'
  end
end

EM.run { EM.connect(HOST, 443, Connection) }

The main way this differs from the implementation in Faraday is that it's ok for @cert_store.verify(certificate) to fail. If it fails, the certificate is not added to the X509::Store, but we don't raise an exception immediately. Instead we just require that the last certificate in the chain is verified, and that it passed hostname checking.

When I run this script, I see:

#<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name CN=ISRG Root X1,O=Internet Security Research Group,C=US>, issuer=#<OpenSSL::X509::Name CN=DST Root CA X3,O=Digital Signature Trust Co.>, serial=#<OpenSSL::BN:0x000000010f1df190>, not_before=2021-01-20 19:14:03 UTC, not_after=2024-09-30 18:14:03 UTC>
[ UNVERIFIED ]

#<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name CN=R3,O=Let's Encrypt,C=US>, issuer=#<OpenSSL::X509::Name CN=ISRG Root X1,O=Internet Security Research Group,C=US>, serial=#<OpenSSL::BN:0x000000010f1decb8>, not_before=2020-09-04 00:00:00 UTC, not_after=2025-09-15 16:00:00 UTC>
[ VERIFIED ]

#<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name CN=faye.jcoglan.com>, issuer=#<OpenSSL::X509::Name CN=R3,O=Let's Encrypt,C=US>, serial=#<OpenSSL::BN:0x000000010f1de808>, not_before=2021-10-21 15:10:22 UTC, not_after=2022-01-19 15:10:21 UTC>
[ VERIFIED ]

We have 3 certs here:

  • A: CN=ISRG Root X1,O=Internet Security Research Group,C=US
  • B: CN=R3,O=Let's Encrypt,C=US
  • C: CN=faye.jcoglan.com

The first cert, A, is not verified given the default paths for X509::Store. However, cert B is verified, and then cert C is also verified if cert B has been added to the store.

If A is added to the store, this causes cert C to fail verification. Likewise if cert B is not added, cert C fails to verify.

What I'm wondering is: is the implementation above valid? Is it ok for cert A to fail, if cert B passes verification and can then go on to verify C? Does this mean there is some chain of certs from my machine's trusted certs to the site's own cert, and that means it can be trusted? Or should the failure of cert A cause the connection to fail?

@jsha
Copy link

jsha commented Nov 29, 2021

(wound up here from Twitter)

Certificate validation is a fairly tricky subject and is ideally left up to the TLS library. In particular I think the issue you're asking about is the difference between the chain (what the server sends you) and the path (what your certificate validator builds based on the leaf certificate, the untrusted contents of the chain, and our root store). Here are some blogs about that:

https://medium.com/@sleevi_/path-building-vs-path-verifying-implementation-showdown-39a9272b2820
https://medium.com/@sleevi_/path-building-vs-path-verifying-the-chain-of-pain-9fbab861d7d6

So I can get some context on the question: are you doing validation by hand as a way to work around verification failure? Or were you already doing validation for another reason, and are trying to debug why this specific case fails?

The first cert, A, is not verified given the default paths for X509::Store

That's because it's a cross-signed root, and the root it's signed by (DST Root CA X3) is expired, and macOS enforces that expiration. Well-behaved validation code should ignore the fact that A doesn't validate, so long as it can build a valid path to a trusted root (even if that path doesn't use any certificates from the chain).

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

So I can get some context on the question: are you doing validation by hand as a way to work around verification failure? Or were you already doing validation for another reason, and are trying to debug why this specific case fails?

@jsha I am using code from this library in faye-websocket to implement certificate verification, because eventmachine doesn't do that itself in its start_tls API. More context is in this blog post: https://blog.jcoglan.com/2020/07/31/missing-tls-verification-in-faye/

I am not an expert in this stuff and my priority is to make sure the library I'm shipping is safe. If it turns out Ruby's openssl bindings are at fault here for not trusting the certificates my site is presenting, so be it.

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

I would note that net/http succeeds in fetching the same URL, i.e. this script succeeds:

require 'uri'
require 'net/http'

uri = URI.parse('https://faye.jcoglan.com/')

puts Net::HTTP.get_response(uri)

I'm not sure what net/http does different and can't figure out how it uses the X509::Store class from browsing the code.

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

@jsha Your article reflects roughly my understanding of what's going on, without being an expert -- I assumed that the client is given a set of certificates, and as long as it can find some path from a cert it trusts (something in the X509 default store) to the server's cert, things are ok.

Does that mean my script is doing something valid when it sees that it can verify cert 2, and use that to verify cert 3, and therefore cert 3 is trustworthy, even though cert 1 is untrusted? Or have I misunderstood?

@jsha
Copy link

jsha commented Nov 29, 2021

My Ruby is a bit rusty and in particular I'm not familiar with EventMachine, but I suspect net/http doesn't reference X509::Store because it lets OpenSSL do the verification itself - by making sure verify_mode is VERIFY_PEER. In that situation OpenSSL will use whatever the default trust store is, but you can override that with ca_path / ca_file.

I see you have this call:

start_tls(sni_hostname: HOST, verify_peer: true)

Is that not enough to turn on certificate validation? Why do you also need to implement ssl_verify_peer?

I assumed that the client is given a set of certificates, and as long as it can find some path from a cert it trusts (something in the X509 default store) to the server's cert, things are ok.

This is how things are supposed to work, but per the article don't quite work that way. And the default behavior differs based on which version of OpenSSL. What version are you running?

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

Is that not enough to turn on certificate validation? Why do you also need to implement ssl_verify_peer?

It's an implementation decision by EventMachine not to fully implement certificate verification itself, but to provide a callback for the client to do it, because different protocols that run atop TLS have different requirements here, I believe. Faraday originally spotted this and implemented the verification code I'm referring to in this issue.

What version are you running?

I'm not actually sure which version Ruby is linked against. I'm on macOS 11.6 and installed ruby using sudo ruby-install -r /opt/rubies ruby 2.7.5 no other configuration involved.

@jsha
Copy link

jsha commented Nov 29, 2021

  def ssl_verify_peer(cert_text)
    certificate = OpenSSL::X509::Certificate.new(cert_text)

    @last_cert   = certificate
    @last_verify = @cert_store.verify(certificate)

One thing I notice is missing from the call to verify() is the chain (see https://ruby-doc.org/stdlib-2.5.1/libdoc/openssl/rdoc/OpenSSL/X509/Store.html). That will be required for this to work right; but as I said before, if you can avoid overriding ssl_verify_peer, and just rely on verify_peer: true, that is much better.

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

I have just checked this and to confirm: passing verify_peer: true to start_tls is not sufficient to check the hostname. If you skip the validation in the other callbacks and connect to a server with broken certificates, the connection succeeds.

@jsha
Copy link

jsha commented Nov 29, 2021

Ah, and now that I'm coming to understand this code a bit better:

  def store_cert(cert)
    @cert_store.add_cert(cert)
  rescue OpenSSL::X509::StoreError => error
    raise error unless error.message == 'cert already in hash table'
  end

I'm pretty sure this is incorrect. This adds the peer's certificate to your store of trusted roots. In general you should not need to modify the Store after you have initialized it with a set of trusted roots.

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

One thing I notice is missing from the call to verify() is the chain

I think the chain is built internally by adding certs to the store via OpenSSL::X509::Store#add_cert. If I don't add cert 2 to the store in this way, then cert 3 fails to verify.

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

I'm pretty sure this is incorrect. This adds the peer's certificate to your store of trusted roots. In general you should not need to modify the Store after you have initialized it with a set of trusted roots.

Ah ok, let me try passing the chain as you suggest and see what happens.

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

This produces the same result: cert 1 is untrusted, the other two are trusted:

require 'bundler/setup'
require 'eventmachine'
require 'openssl'

HOST = 'faye.jcoglan.com'

module Connection
  def connection_completed
    @cert_store  = OpenSSL::X509::Store.new
    @last_cert   = nil
    @last_verify = false

    @cert_store.set_default_paths

    start_tls(sni_hostname: HOST, verify_peer: true)
  end

  def ssl_verify_peer(cert_text)
    certificate = OpenSSL::X509::Certificate.new(cert_text)

    @last_cert   = certificate
    @last_verify = @cert_store.verify(certificate, @cert_store.chain)

    p certificate

    if @last_verify
      puts "\e[37;42m[ VERIFIED ]\e[0m"
    else
      puts "\e[37;41m[ UNVERIFIED ]\e[0m"
    end

    true
  end

  def ssl_handshake_completed
    unless @last_verify and OpenSSL::SSL.verify_certificate_identity(@last_cert, HOST)
      raise OpenSSL::SSL::SSLError,
            %(host "#{HOST}" does not match the server certificate)
    end
  end
end

EM.run { EM.connect(HOST, 443, Connection) }

@jsha
Copy link

jsha commented Nov 29, 2021

Looking at the EventMachine code, I start to see the problem. It has a different meaning for verify_peer than OpenSSL (including Ruby's OpenSSL bindings). In OpenSSL when you turn on "verify_peer", it means "verify peer using the default validator and default roots." However, in EventMachine "verify_peer" means "verify peer using a custom callback." This is a bad API shape, since custom certificate validation is very hard to get right, as you're seeing:

https://github.com/eventmachine/eventmachine/blob/8e1d6b11fd8400593af035a7a0d203d24c10c9b0/ext/ssl.cpp#L423-L428

EventMachine makes a callback in C++, which AFAICT calls out to the Ruby callback: https://github.com/eventmachine/eventmachine/blob/8e1d6b11fd8400593af035a7a0d203d24c10c9b0/ext/ssl.cpp#L675-L698

Ideally that Ruby callback should take a cert and a chain. But it seems to only take a cert. Also, if you look at the second chunk of C code, it gets the cert using X509_STORE_CTX_get_current_cert. I'm not that familiar with OpenSSL's API here, but those docs say "X509_STORE_CTX_get_current_cert() returns the certificate which caused the error or NULL if no certificate is relevant to the error."

So it seems like there are two API bugs in EventMachine:

  • it should support a default certificate validator which uses OpenSSL's validator
  • for custom validators, it needs to pass the cert and the chain to to callback

And possibly one implementation bug:

From your code:

@last_verify = @cert_store.verify(certificate, @cert_store.chain)

This seems incorrect. The docs for Store::chain say:

The certificate chain constructed by the last call of verify.

In other words, it's not the chain sent by the peer.

It's not immediately clear to me that you can fix this without an upstream fix in EventMachine.

By the way, thanks for working on this! Certificate validation code can be a real slog to work with, but I'm glad you're helping make the ecosystem more secure.

@jsha
Copy link

jsha commented Nov 29, 2021

It looks like EventMachine does expose an accessor get_peer_cert: https://www.rubydoc.info/github/eventmachine/eventmachine/EventMachine%2FConnection:get_peer_cert. But it doesn't expose the corresponding get_peer_chain it would need in order to make it feasible to write your own certificate validation. (And again, it's unreasonable for EventMachine to say users must implement their own certificate validation; it should allow them to delegate that thankless task to OpenSSL)

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

Yeah this reminds me of a lot of what you'll find if you dig in the issue threads linked in https://blog.jcoglan.com/2020/07/31/missing-tls-verification-in-faye/ that go back years. Long story short: this isn't getting fixed in EventMachine. It's possible the ideal solution is to migrate off of EM entirely, but that would break everything depending on my libraries. My priority is to make this as safe as reasonably possible given EM is probably not going to change its behaviour.

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

Do you know any way I could make this safer? Is it fundamentally bad to ignore the verification failure on cert 1? Or am I fine letting this connection through because I managed some string of successful verify() calls to end at the site's own certificate?

@jcoglan
Copy link
Author

jcoglan commented Nov 29, 2021

The explanation from Let's Encrypt about what these certs are for seems to imply clients are expected to accept either the first or second cert in the trace above, and so failing on one of them is normal?

@jsha
Copy link

jsha commented Nov 29, 2021

I would ignore the Let's Encrypt blog post, and indeed my early speculation on this thread. The problem isn't really about the cross-signed root expiration.

The fundamental problem is this: in ssl_verify_peer you have a leaf certificate (aka end-entity certificate). In the default root store you have a set of roots. In between there is a whole universe of intermediates that are trusted only by virtue of being signed by a root.

In order to validate a peer certificate, you need to build a path from the leaf certificate, through some number of intermediates, to some root certificate. But where do you get those intermediates? Normally they are in the cert chain provided during the TLS handshake, but EventMachine neglected to allow you access to those.

In theory you could bundle a list of common intermediates and give those to OpenSSL as untrusted inputs to the verification process. But that's very fragile - you'd need to update the intermediates all the time. CAs change their intermediates regularly and without advance notice, and expect clients to still be able to verify the new intermediate (via the TLS chain mechanism).

You could provide a way for your users to input a list of regularly-updated intermediates. Still fragile but at least provides a way to fix things when they break.

You could offer your users an interface that says "verify the peer certificate by checking its bytes match this hardcoded certificate we expect." But that's also fragile - it would break every time the peer rotated its certificate, which should be often.

Here's a really hacky idea: you could make your own connection to the peer, collect the chain, and provide that as untrusted input to the certificate verification.

@jcoglan
Copy link
Author

jcoglan commented Nov 30, 2021

Normally they are in the cert chain provided during the TLS handshake, but EventMachine neglected to allow you access to those.

Isn't this what EM is providing when it repeatedly invokes ssl_verify_peer on the connection? For this one connection, that method is called three times, and one of those certs (2) is trusted by my default root certs. Then it signs cert 3 (the leaf) and so the site is trusted.

Have I misunderstood something here?

In the case of my library there is also an escape hatch in the form of an option letting the caller provide their own root CA file, and we fall back to set_default_paths if that's not used.

@jsha
Copy link

jsha commented Nov 30, 2021

Isn't this what EM is providing when it repeatedly invokes ssl_verify_peer on the connection? For this one connection, that method is called three times, and one of those certs (2) is trusted by my default root certs. Then it signs cert 3 (the leaf) and so the site is trusted.

Aha, that makes sense. I was misunderstanding the API. So, it may be possible to do something reasonable here. I can't say exactly what the right thing is at the moment, but it should be possible. I can take a closer look later this week.

@jcoglan
Copy link
Author

jcoglan commented Nov 30, 2021

Thanks @jsha, this was really helpful. Absolutely no obligation to dig into this any further :)

@jsha
Copy link

jsha commented Nov 30, 2021

The rough idea I'm thinking of is: accumulate the chain as you receive each subsequent call to ssl_verify_peer, and then verify the whole thing in ssl_handshake_completed. But note that I'm pretty unfamiliar with all these APIs, so please confirm with the documentation and so on. :-)

@mackuba
Copy link

mackuba commented Mar 7, 2024

Hey @jcoglan, hello from 2024 - have you managed to figure this one out? 😅

I'm using EventMachine, faye-websocket and now also em-http-request in my project, and I've come across an issue that I suspect might be the same thing as you've discussed here…

The specific URL I'm trying to load is this one: https://genco.me/.well-known/did.json

open-uri / Net::HTTP loads it just fine. But the em-http-request code (copied from faraday apparently) throws: OpenSSL::SSL::SSLError unable to verify the server certificate for "genco.me". I thought this was maybe a matter of not setting the certificate store file properly, but it seems to be using the same kind of store as Net::HTTP in the same way.

I don't know that much about SSL certificate verification and to be honest I'd prefer not to, since I think this is something that's best not to do unless you're sure you're doing it right - I don't understand why EventMachine can't just delegate to the same verification code that Net::HTTP is using… or maybe it can be told to?

@jcoglan
Copy link
Author

jcoglan commented Mar 27, 2024

@mackuba It looks like that domain uses Let's Encrypt certificates which was also the reason that faye-websocket had to change its SSL verifier. We originally based ours on the Faraday one, which wanted to check that every certificate received was trusted. This is not always the case; since not all clients agree on their default set of trusted CAs, some servers will send you multiple copies of their certificate signed by different root CAs, hoping you'll trust one of them, and then you can form a chain from there to your domain's certificate.

This is the change we made in faye/faye-websocket-ruby@d9428fa: rather than verify every certificate, we check that there is some chain of certificates from the client's trusted root CAs to the domain certificate, and that the final certificate matches the hostname for the request. The current implementation of our verifier is here: https://github.com/faye/faye-websocket-ruby/blob/0.11.3/lib/faye/websocket/ssl_verifier.rb

I'm not sure why this is not baked into EventMachine but there's not much we can do about this. This does something very similar to what's inside Net::HTTP, particularly the use of OpenSSL::X509::Certificate, OpenSSL::X509::Store#{set_default_paths,verify,add_cert}, and OpenSSL::SSL.verify_certificate_identity.

@mackuba
Copy link

mackuba commented Apr 3, 2024

@jcoglan ok, thanks! Tbh, I've since then rewritten this code again and moved back to sync HTTP with Net::HTTP, because I've found a way to do it this way and it makes the code simpler 😉 But this should be helpful if I decide to use em-http again 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants