Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing a cert containing a raw IPv6 address as DNS SAN fails #3943

Closed
jharbott opened this issue Sep 27, 2017 · 13 comments
Closed

Parsing a cert containing a raw IPv6 address as DNS SAN fails #3943

jharbott opened this issue Sep 27, 2017 · 13 comments

Comments

@jharbott
Copy link

The scenario is that I'm trying to access a server at e.g. https://[2001:db8::17]/ running with a certificate that contains both DNS:2001:db8::17 and IP:2001:db8::17 via urllib3, which in turn uses the cryptography module in order to parse the certificate.

With released version 2.0.3, there is an error already in cert.extensions.get_extension_for_class(x509.SubjectAlternativeName), if I install the current git master, the error moves to a later stage when ext.get_values_for_type(x509.DNSName) is called.

After asking in #cryptography-dev I used the workarounf of dropping the DNS:... part from the certificate, but in that case the certificate is no longer accepted by the standard httplib module, see https://bugs.python.org/issue23239, so we really need a fix to accept that part here.

@jharbott
Copy link
Author

cert.pem.txt

@jharbott
Copy link
Author

Looks like this simple patch would work for me, avoiding calling a deprecated function or so it claims. Sadly next thing to happen is urllib3 blowing up in another idna check, but that's a different issue.

diff --git a/src/cryptography/x509/extensions.py b/src/cryptography/x509/extensions.py
index eb4b927..06a99ef 100644
--- a/src/cryptography/x509/extensions.py
+++ b/src/cryptography/x509/extensions.py
@@ -1180,7 +1180,7 @@ class GeneralNames(object):
         # just one value.
         objs = (i for i in self if isinstance(i, type))
         if type != OtherName:
-            objs = (i.value for i in objs)
+            objs = (i.bytes_value for i in objs)
         return list(objs)
 
     def __repr__(self):

@jharbott
Copy link
Author

Patch v2:

diff --git a/src/cryptography/x509/extensions.py b/src/cryptography/x509/extensions.py
index eb4b927..e4bedce 100644
--- a/src/cryptography/x509/extensions.py
+++ b/src/cryptography/x509/extensions.py
@@ -21,7 +21,7 @@ from cryptography.hazmat.primitives.asymmetric.rsa import RSAPublicKey
 from cryptography.x509.certificate_transparency import (
     SignedCertificateTimestamp
 )
-from cryptography.x509.general_name import GeneralName, IPAddress, OtherName
+from cryptography.x509.general_name import GeneralName, IPAddress, OtherName, DNSName
 from cryptography.x509.name import RelativeDistinguishedName
 from cryptography.x509.oid import (
     CRLEntryExtensionOID, ExtensionOID, ObjectIdentifier
@@ -1179,7 +1179,9 @@ class GeneralNames(object):
         # which we return directly because it has two important properties not
         # just one value.
         objs = (i for i in self if isinstance(i, type))
-        if type != OtherName:
+        if type == DNSName:
+            objs = (i.bytes_value for i in objs)
+        elif type != OtherName:
             objs = (i.value for i in objs)
         return list(objs)

@reaperhulk
Copy link
Member

Right now get_values_for_type returns the IDNA decoded (U-label) string for everything. With this change we would be returning A-label byte strings for DNSName. That's a compatibility break that may be an issue.

Our original plan was to deprecate the value attribute and push people to bytes_value and then eventually drop value, but I think we forgot that many people are obtaining the value via get_values_for_type and never seeing the containing GeneralName object.

I'm frankly not entirely sure what to do here. Do we need a get_bytes_values_for_type? Do we want to just pay the piper now and have a compatibility break?

@reaperhulk reaperhulk added this to the Twenty first release milestone Sep 27, 2017
@ronf
Copy link

ronf commented Sep 27, 2017

According to RFC 5280, it's not legal to have subjectAltName entries of type dNSName which contain IP addresses. Things of type dNSName should be valid DNS labels and the name MUST be in the "preferred name syntax", as specified by Section 3.5 of [RFC1034] and as modified by Section 2.1 of [RFC1123]. Even with the changes in RFC 1123 to allow DNS label components beginning with a number, a DNS label which exactly matches an IPv4 dotted decimal address is not allowed, and an IPv6 address would also not be allowed due to the presence of colons. So, I'm not sure we should be trying to support literal IP addresses appearing in dNSName GeneralNames.

@jharbott
Copy link
Author

Being RFC compliant is one thing, having software that works with the rest of the world may be a different one. IPv4 dotted decimal address as DNS label is supported currently and is being used in many locations as a workaround for issues like the one I mentioned above. So it seems inconsistent to deny feature parity for IPv6 with the argument being "now that violates RFC". Denying both and forcing people to fix their real issues might be a good approach, but you'd better get some asbestos suits ready before that.

@reaperhulk I do agree that conserving compatibility is a valid point, maybe adding get_bytes_values_for_type and then having urllib3 consume that might be the right approach. In particular since changes over there are needed anyhow, see urllib3/urllib3#1269

@reaperhulk
Copy link
Member

Yeah RFC compliance isn’t relevant here (sadly) because there are certs in the wild that do this as a workaround for bugs in various software.

requests will need to make changes no matter what due to its own idna functions (which are still cryptography's fault), but I suspect @Lukasa would appreciate a method that gives A-label bytes that require no encoding.

@ronf
Copy link

ronf commented Sep 28, 2017

I'll admit I haven't looked closely at the cryptography code in question here, but I'm still a bit lost about the discussion of returning these values as bytes. If we were to allow a literal IPv4 or IPv6 address to be associated with a GeneralName of type dNSName, I would still expect the return value to be a str value, not a bytes value. It would happen to be all ASCII in this case, but many non-IP-address DNS names would already return str values that were all ASCII, if they happened not to contain any Punycode sequences.

As for wanting to return an encoded Punycode string instead of converting that to its Unicode equivalent, I guess I could see a function that returned a bytes value instead of a str making sense in that case, but why would that be the right thing to do for this case of IP addresses encoded as DNS names? Also, how would a caller know in advance whether the name they are getting the value of is going to be an actual DNS name that they should request as a str vs. an IP address they should request as bytes? Wouldn't it be simpler to just always return a string in both of these cases? After getting it back, they could always try and convert that string to an IP address to see if it was a DNS name or an IP address, and then from there convert it to whatever form they needed.

@reaperhulk
Copy link
Member

@ronf we've chosen to deprecate the U-label behavior entirely (which is what value returns) in favor of returning the bytes from the certificate. In the case of an IDNA encoded DNSName this means you'll get the ASCII bytes of the A-label as a byte string when calling bytes_value and the U-label when calling value (but the latter will also raise a deprecation warning).

The complexity here is that we have a bug in our IDNA decoding that causes raw IPv6 (and IPv4) to raise an exception when it attempts to generate the U-label string. Since value is deprecated ideally we'd just say to call bytes_value instead, but get_values_for_type complicates this since it implicitly calls value on the caller's behalf and callers may be expecting U-label.

So, we could fix the problem in cryptography by resolving the bug in the IDNA decode such that value returns a unicode string with the IPv4/v6 literal value without an exception, but that doesn't give us a path forward for when we drop the deprecated attributes. Additionally, requests takes the U-label DNSNames and encodes them back to A-label so they can do comparisons for validity checking. This is a bad approach (since IDNA has multiple versions and not all versions have bijective mapping) but one we forced them into with our foolish U-label API. In their case even if we fix the bug in value they'll have to make a similar change in their encoder to have it work.

get_values_for_type can return values from RFC822Name, URI, DNSName, DirectoryName, RegisteredID, IPAddress, and OtherName. Of those, the first 3 now have bytes_value that return A-label data and the remaining 4 return specialized types (Name, ObjectIdentifier, an ipaddress object, and a byte string respectively).

I am still not happy with any of the solutions I've come up with for this issue. A new method seems foolish since it's just new return types... What about an optional arg a_label that defaults to False but raises a warning in that case? Setting it to True would return the A-label bytes. Of course, it'd be difficult to ever remove that optional argument at that point although we could eventually just assert that it's True.

@ronf
Copy link

ronf commented Sep 28, 2017

Thanks for the extra info, @reaperhulk. If the long term plan is to switch from str to bytes and not do the Unicode conversions, I agree it would make sense to also return a bytes value which is an ASCII representation of the IPv4/IPv6 address exactly matching how it was encoded in the GeneralName. If both "value" and "bytes_value" will be available for some time, though, I think it could also make sense to set "value" to a string version of that same ASCII value. If the IDNA encoder fails here, perhaps you could have fallback code that would just do a decode() of the bytes value with encoding 'ascii' to cover that. That may fail if there's non-ASCII data in the bytes, but that would be illegal for a dNSName, so breaking in that case seems like it might be ok.

As for get_values_for_type, I agree there isn't a great answer. You can't easily add a get_bytes_value_for_type() that parallels the existing function because some of the types don't have a bytes_value property. A defaulted a_label argument is an interesting alternative, but I agree it would be hard to ever get rid of that if it defaults to the behavior you want to eventually disallow. You'd have to deprecate it in phases where you first default it to False but make it a warning after a while if you don't set the argument to True, and then eventually only support the new behavior, still allowing the argument but only if set to True and warning whenever it was present that it is a no-op and should be removed. It would also bit a bit awkward if someone specified the a_label argument when asking to get values for a type like iPAddress.

@reaperhulk
Copy link
Member

I put up a potential a_label arg PR for discussion, but I'm starting to wonder if the slightly incompatible backwards-incompatible idea of returning the A-label values as strings (not byte strings) might be a better solution.

If we do that then calling get_values_for_type will still return strings in py2/py3. In any non-IDNA case this means there is no change of any kind. In the IDNA case instead of receiving the U-label decoded string the caller will get an A-label string, which they'll have to decode to U-label themselves (if desired). In the case where they were encoding it to A-label via a call to idna.encode it will produce the same output as before though.

That's still not great, but I haven't come up with an idea that isn't bad in some dimension so far...

@ronf
Copy link

ronf commented Oct 3, 2017

Keeping get_values_for_type as returning strings would definitely be better from a backward compatibility standpoint, and it's a lot cleaner than having an extra argument to choose between A-labels and U-labels. Since it's fairly easy for a caller to do their own conversion back to a full Unicode string, perhaps that's the better option for now.

If a caller passed in something that was actual Unicode and not already an A-label string, would you still do the automatic conversion for them in the process of converting the value to bytes?

@reaperhulk
Copy link
Member

This is now resolved in master via #3951, #3953, #3954, and #3955

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants