Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host-Keys signed by a CA aren't recognized as known hosts #771

Open
cerlestes opened this issue Jul 7, 2016 · 21 comments · May be fixed by #2270
Open

Host-Keys signed by a CA aren't recognized as known hosts #771

cerlestes opened this issue Jul 7, 2016 · 21 comments · May be fixed by #2270

Comments

@cerlestes
Copy link

cerlestes commented Jul 7, 2016

Abstract

It seems like paramiko isn't correctly processing keys from known_hosts that are annotated with @cert-authority. As I'm unfamiliar with python and paramiko, I cannot try to reproduce it. The setup works fine using the OpenSSH tools, but fails when using paramiko.

Environment

I've encountered this bug in duplicity, a backup program that uses paramiko as an SSH agent.

Client has a /etc/ssh/ssh_known_hosts file that contains a @cert-authority line with the public key of the signing CA:

@cert-authority *.mydomain ssh-rsa AAAAB... hostkey_ca

Server has *-cert.pub files for all of its host keys, e.g. /etc/ssh/ssh_host_rsa_key-cert.pub, which are configured in its sshd_config like so: HostCertificate /etc/ssh/ssh_host_rsa_key-cert.pub. These certificates were created using the private key of the aforementioned CA.

Situation

Connecting via OpenSSH's program suite works fine:

user@client:~# sftp backups.mydomain
Connected to backups.mydomain.
sftp> exit

Connecting via paramiko (through duplicity) does not:

user@client:~# duplicity Backups sftp://backups.mydomain/Backups
The authenticity of host '[backups.mydomain]:1234' can't be established.
SSH-RSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
Are you sure you want to continue connecting (yes/no)? no

The fingerprint displayed is of the signed public key.

Paramiko only manages to connect after adding the domain/ip and public host key of the server to the known_hosts file in the client, when it should not need this due to the given CA key and certificate.

@bitprophet
Copy link
Member

Thanks for the report! I don't have personal experience using a CA system with SSH keys so not sure offhand how well Paramiko supports it; could be a bug, could be outright lack of support.

I'm +1 on getting it working but would need someone both with CA experience & comfort w/ the codebase to at least get us pointed in the right direction.

@sly010
Copy link

sly010 commented Sep 23, 2016

+1 on this. This is very useful and client certificates seem to already work, but host certificates have a slightly different syntax:

https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.certkeys

@bitprophet
Copy link
Member

Tying this to #531 even though the two probably don't technically overlap (but they kinda do conceptually).

@kazzmir
Copy link

kazzmir commented Feb 20, 2018

I am very interested in supporting host keys with cert-authority. Has there been any progress on this? If not I will attempt an implementation.

@anthonyjmartinez
Copy link

Still not implemented as near as I can tell from the documentation. Using load_system_host_keys on a file containing entries that start with @cert-authority fails to load these lines. As host certificates are fundamental to my SSH-at-scale use I can't use Paramiko unless I ignore missing host keys or figure out this implementation.

@kazzmir did you ever attempt to implement?

@maxswjeon
Copy link

No Implementation until 2021?

@tacerus
Copy link

tacerus commented Dec 11, 2022

Hi! I just countered this issue upon trying to use Paramiko. Is there any update on it?

@bskinn
Copy link
Contributor

bskinn commented Dec 17, 2022

Flagging this to include/consider in the key/auth rewrite of #387.

That rewrite is waiting for bitprophet's drop of Python 2 and older 3.x from all of his packages. Not sure how long that will take; the key work will only happen after that modernization is done.

@MajorDallas
Copy link

Just ran into this, also, and poked through hostkeys.py to see where the issue is.

# hostkeys.py
# 339 - 347

        fields = re.split(" |\t", line)
        if len(fields) < 3:
            # Bad number of fields
            msg = "Not enough fields found in known_hosts in line {} ({!r})"
            log.info(msg.format(lineno, line))
            return None
        fields = fields[:3]  # <--- right there!

        names, key_type, key = fields

A signed host key's 0th field is @cert-authority, leading to a classic off-by-one error on the next two fields. key becomes the key type (eg. ssh-rsa), which is not a valid base64 string:

In [19]: print(fields[2])
ssh-rsa

In [20]: base64.decodebytes(bytes(fields[2], "utf8"))
---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-20-be56aa59c4a7> in <module>
----> 1 base64.decodebytes(bytes(fields[2], "utf8"))

/usr/lib64/python3.10/base64.py in decodebytes(s)
    560     """Decode a bytestring of base-64 data into a bytes object."""
    561     _input_type_check(s)
--> 562     return binascii.a2b_base64(s)
    563
    564

Error: Incorrect padding

A naive solution might be to throw some more re methods at it. Maybe re.match with a pattern that gets just the key, like r'\S+=='--which would technically work as long as the key comment doesn't have == in it anywhere 😝 -- or do more but simpler matches on each field after the line is split to find all the fields that can't be the key. Kinda just spitballin' here.

A related ergonomics issue: in my case, this signed key happens to be the first one in known_hosts and it's not even the key for the host I'm trying to connect to. That the HostKeyEntry.from_line method crashes out the entire program in this case is pretty annoying--I have probably 2,000 keys in my known_hosts file and none of them were checked. My recommendation would be to catch the exception in HostKeys.load, spit out a warning to a log and stderr, and carry on.

@bskinn
Copy link
Contributor

bskinn commented Jun 25, 2023

Thanks very much for the investigation and detail on this, @MajorDallas!

If I read your comment correctly, and to summarize, you have a twofold recommendation?

  1. Improve line matching to account for this different syntax for host-signed keys.
  2. Improve the host line reading code to gracefully warn and skip on line-parse failures.

Seems to me that these ought to be two different PRs.

Separately:

  • Is there a specification for this syntax that you could link to?
  • Could you provide an example of a known_hosts line with and without host-signing syntax, for reference?

@MajorDallas
Copy link

MajorDallas commented Jul 6, 2023

A twofold recommendation, yes. However, for the first recommendation, rather than

Improve line matching to account for this different syntax for host-signed keys

I would say:

Improve line matching to correctly parse the known_hosts file format.

I know that's a quibble, but this issue will affect more than signed host keys: any host key that has been revoked, somehow specifies a host pattern* with a space in it (not sure that's even allowed, tbh), or comments will all break under the current method of splitting on space and indexing the resultant list without checking what's in the 0th field. I haven't checked, but if the same or a similar algorithm is also used for parsing the authorized_keys file then there are even more optional fields to trip it up--but that's something to discuss in #387.

  • Is there a specification for this syntax that you could link to?
  • Could you provide an example of a known_hosts line with and without host-signing syntax, for reference?

The only spec I could find is right out of man 8 sshd, under SSH_KNOWN_HOSTS FILE FORMAT:

Each line in these files contains the following fields: marker (optional), hostnames, keytype, 
base64-encoded key, comment.  The fields are separated by spaces.

The marker is optional, but if it is present then it must be one of “@cert-authority”, to indicate
that the line contains a certification authority (CA) key, or “@revoked”, to indicate that the key
contained on the line is revoked and must not ever be accepted.  Only one marker should be
used on a key line.

...

An example ssh_known_hosts file:

  # Comments allowed at start of line
  closenet,...,192.0.2.53 1024 37 159...93 closenet.example.net
  cvs.example.net,192.0.2.10 ssh-rsa AAAA1234.....=
  # A hashed hostname
  |1|JfKTdBh7rNbXkVAQCRp4OQoPfmI=|USECr3SWf1JUPsms5AqfD5QfxkM= ssh-rsa AAAA1234.....=
  # A revoked key
  @revoked * ssh-rsa AAAAB5W...
  # A CA key, accepted for any host in *.mydomain.com or *.mydomain.org
  @cert-authority *.mydomain.org,*.mydomain.com ssh-rsa AAAAB5W...

Somewhat interesting (to me): as far as I can tell, libssh doesn't check for this field, either (I am not C-literate, though, and could have just missed it). I think the best possible reference is OpenSSH's own code. It looks like their line parsing logic starts in hostfile.c:760, with the function check_markers called on line 803 after ignoring leading whitespace and skipping comments, but before doing anything else at all.

* On the topic of host patterns: I didn't read real far into how the host field is handled. I assume it's parsed at least enough to recognize a non-standard port (the syntax requires the pattern be enclosed in square brackets), but what about eg. wildcards? OpenSSH specifies support for * and ? globs plus ! for negation in man 8 sshd. The code that parses those bits starts in match_hostname and involves several other functions.

(edit: use code block instead of quote block, since apparently "revoked" is a github user and I accidentally pinged them by quoting the man page 😅 )

@MajorDallas
Copy link

MajorDallas commented Jul 19, 2023

I threw this together in about 20 minutes and tested with a cert-authority with comments example and a marker-free, comment-free example:

from enum import Enum
from typing import Optional


class Markers(str, Enum):
    CERT_AUTHORITY = "@cert-authority"
    REVOKED = "@revoked"


class HostKeyParser:
    marker: Optional[Markers]
    hosts: str
    keytype: str
    key: str
    comment: str
    _original_line: str

    __slots__ = (
        "marker",
        "hosts",
        "keytype",
        "key",
        "comment",
        "_original_line",
    )

    def __init__(self, hk_line: str):
        (
            self.marker,
            self.hosts,
            self.keytype,
            self.key,
            self.comment,
        ) = self.parse_line(hk_line)
        self._original_line = hk_line

    @staticmethod
    def parse_line(hk_line: str) -> tuple[Optional[Markers], str, str, str, str]:
        fields = hk_line.split(" ")
        try:
            marker = Markers(fields[0])
            hosts, keytype, key, *comment = fields[1:]
        except ValueError:
            marker = None
            hosts, keytype, key, *comment = fields
        return marker, hosts, keytype, key, " ".join(comment)

Fundamentally it's still just splitting on space, but I think my concerns about spurious spaces in fields besides comment were unfounded. Maybe this class set-up is overkill, too--I mean, the actual parsing logic is a static method that could work just as well being a plain function. The only trade-off might be an object where each field can be referenced by name, but a function can always return a NamedTuple or a dataclass that's even simpler than this one 🤷

It works, though!

Edit:

Yeah, I think I like it better this way:

@dataclass(frozen=True, slots=True)
class HostKey:
    marker: Optional[Markers]
    hosts: str
    keytype: str
    key: str
    comment: str
    _original_line: str

def parse_line(hk_line: str) -> HostKey:
    fields = hk_line.split(" ")
    try:
        marker = Markers(fields[0])
        hosts, keytype, key, *comment = fields[1:]
    except ValueError:
        marker = None
        hosts, keytype, key, *comment = fields
    return HostKey(marker, hosts, keytype, key, " ".join(comment), hk_line)

@bskinn
Copy link
Contributor

bskinn commented Jul 20, 2023

Agreed, @MajorDallas, the simplicity of the bare function is appealing.

This implementation of the parsing logic makes sense to me and looks good -- it has a number of nice features.

Looking at what's in hostkeys.py, though, this is rather different than what's there.

Unless there's a reason to re-implement the container class, probably best to fix this with as minimal modification as possible to the existing code? This logic seems straightforward enough that adapting it shouldn't be too difficult?

@MajorDallas
Copy link

You're right. I admit I didn't refer to the existing code, I just wanted to prototype the logic. I'd need to spend some time to figure out how to integrate it, but someone already familiar with that module could certainly do it faster.

One thing my prototype doesn't consider is host patterns. I kind of just assumed that there's logic elsewhere that implements the glob and negation operators and that the parser doesn't need to care about that.

@bskinn
Copy link
Contributor

bskinn commented Jul 20, 2023

Nod, no worries - a POC like this should make it easier to implement: makes it a port, rather than a from-scratch task.

Feel free to take a stab yourself, or leave it here for someone else to pick up. Your call!

@MajorDallas
Copy link

I just noticed something from the sample hostkeys above:

closenet,...,192.0.2.53 1024 37 159...93 closenet.example.net

The second field, the key type, here is 1024 rather than eg. ssh-rsa. I figured each key type has some int associated with it, and sure enough the OpenSSH code checks against an enum called sshkey_types. Now, I'm not a C programmer... But this enum only has 14 members and they don't seem to be using any non-default values for the enum members, meaning the last member's int value should be 13 if I'm correctly understanding how C enums work. None of the sshkey_impl structs have a field with 1024 as a value, either, so I don't know what key type that might be. Furthermore, afaict they use the enum to help with looking up structs which have a name field that they do a memcmp against--basically a string match, not an integer match, I think.

Field 3 is 37. Pretty sure that's not a key. The ellipsis in field 4 suggests that's the key, but according to the spec fields >= 4 must be a comment. (>= 5 if there's a marker in field 1)

This very much looks like an invalid hostkey entry to me. I would not expect the man-page authors to put in an invalid example without somehow indicating that it's invalid, though 🤔 . I've never seen a hostkey entry like this in the wild, but it's not like I spend a lot of time reading random known_hosts files. Probably okay to ignore it?

MajorDallas added a commit to MajorDallas/paramiko that referenced this issue Jul 20, 2023
Not complete!

Adds basic parser support for a marker in the first field. Additional
handling of cert-authority should be discussed before being implemented.

Adds sample keys to test_hostkeys.py.
@MajorDallas
Copy link

Well, I took a stab at it 'cause why not 🤷

34299ff

I wasn't sure whether to add an attribute to HostKeyEntry for the marker, if such an attribute would need to be consumed somewhere or how, so I left that off and just implemented the parser logic inside HostKeyEntry.from_line. Based on the return of None if the key type is unknown, it returns None if the marker is @revoked, but that can be adjusted.

I added sample cert-authority and revoked keys to the sample data in test_hostkeys.py, as well, adjusted a couple of asserts accordingly, and have passing tests 😃

Just need to discuss what, if anything, to do with the markers once we have them.

@bskinn
Copy link
Contributor

bskinn commented Jul 20, 2023

Cool - thanks!

I'd say rebase this onto a branch in your fork and create a PR--further discussion of the details of the handling for each marker case and any changes to the data structures would happen there. I'd replicate what you put in this most recent comment, as the starting point for conversation.

(The rebase is important because it's bad practice to develop on main and create a PR from there, because then it becomes very messy if you need to update your fork's main to match the main branch of paramiko/paramiko. Apologies if I'm telling you something you already know, here.)

@MajorDallas
Copy link

An oversight on my part 😅 I'll get a branch and PR set-up.

@bskinn
Copy link
Contributor

bskinn commented Jul 20, 2023

Good deal, thank you!

I will note -- much of bitprophet's attention at the moment is still focused on the key/auth overhaul centered in #387 and #23, so even once we get the PR into a mergeable state, it may stay on the back burner until that work is done.

@harridu
Copy link

harridu commented Jul 29, 2023

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants