Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add markers to known_hosts parser re paramiko#771 #2270

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MajorDallas
Copy link

@MajorDallas MajorDallas commented Jul 20, 2023

Fixes #771. Copying one of my comments from there on what's needed:

I wasn't sure whether to add an attribute to HostKeyEntry for the marker, if such an attribute would need to be consumed somewhere or how, so I left that off and just implemented the parser logic inside HostKeyEntry.from_line. Based on the return of None if the key type is unknown, it returns None if the marker is @revoked, but that can be adjusted.

I added sample cert-authority and revoked keys to the sample data in test_hostkeys.py, as well, adjusted a couple of asserts accordingly, and have passing tests 馃槂

Just need to discuss what, if anything, to do with the markers once we have them.

Not complete!

Adds basic parser support for a marker in the first field. Additional
handling of cert-authority should be discussed before being implemented.

Adds sample keys to test_hostkeys.py.
@bskinn
Copy link
Contributor

bskinn commented Jul 20, 2023

Applied Needs patch since more discussion is needed on how to handle various bits, and implementation changes will be required to suit.

@bskinn
Copy link
Contributor

bskinn commented Jul 20, 2023

@MajorDallas, since we know this is not ready for merge yet, please convert to a draft PR.

@MajorDallas MajorDallas marked this pull request as draft July 20, 2023 23:21
@MajorDallas
Copy link
Author

MajorDallas commented Jul 20, 2023

I hadn't seen #2251 / #2254 before now. I haven't looked closely at the code, though I did see the new lineno param from that PR in main, so maybe it's already implemented even though the PR is still open? Had a quick look at the diff between the two forks. Doesn't look like the changes are likely to conflict, but still very much worth profiling against that data set to make sure this new parsing logic isn't causing a regression in time complexity.

Aside: how do we feel about type annotations? I will gladly start adding them where I'm touching code if there's no opposition to it.

@bskinn
Copy link
Contributor

bskinn commented Jul 20, 2023

100%, @MajorDallas, good catch.

@bskinn
Copy link
Contributor

bskinn commented Jul 20, 2023

Type annotations will be up to bitprophet. I don't believe there are currently plans to type the codebase, though.

(Now that Py2 is dropped, it might be less of a headache than in the past, though.)

@bitprophet
Copy link
Member

bitprophet commented Jul 28, 2023

Thanks for this! Yes, the overall "host keys parser doesn't understand @ annotations" thing is a missing feature, tho I gather this ticket is mostly just drawing a box around the exceptions from this problem, and isn't implementing actual CA checking support. I'd be okay merging this at that level of implementation unless I missed an intent to extend it with the CA-signature support too (?).


Also, for both the submitter and @bskinn: I don't mind patches adding mypy type annotations; it'll be awhile before it's worth explicitly annotating the entire codebase (we just did Invoke, a somewhat smaller and more modern codebase, a while ago and that was even a chore) but since new type hints are backwards compatible with "not caring about type hints"...it's fine! and means less work in the future.

@bskinn
Copy link
Contributor

bskinn commented Jul 28, 2023

馃憤 good to know re the type hints posture, thanks @bitprophet!

@bitprophet
Copy link
Member

bitprophet commented Jul 28, 2023

Right, specific feedback was requested too...

  • I am +1 on storing the markers in HostKeyEntry, it's a custom object type so adding some new attribute or w/e for it shouldn't cause any consternation to existing code, AFAIK. Maybe name it/them with a leading underscore for now just to hedge our bets re: API promises - the entire hostkey class set is old and creaky and due a replacement sometime.
  • I think a host key lookup matching an explicitly revoked key is probably OK to be handled by returning None (as if there was a lookup-miss) but I'm curious what OpenSSH does in this case (that? or an error?) since we normally just default to what they do.

@MajorDallas
Copy link
Author

Thanks for this! Yes, the overall "host keys parser doesn't understand @ annotations" thing is a missing feature, tho I gather this ticket is mostly just drawing a box around the exceptions from this problem, and isn't implementing actual CA checking support. I'd be okay merging this at that level of implementation unless I missed an intent to extend it with the CA-signature support too (?).

Correct: my goal was merely to successfully parse known_hosts. For one thing, I'm not knowledgeable enough in cryptography to implement CA verification, and for another I don't have the cycles right now to learn it. Plus, it's removed enough from the task of parsing known_hosts that it deserves its own ticket.

@MajorDallas
Copy link
Author

MajorDallas commented Aug 28, 2023

I am +1 on storing the markers in HostKeyEntry, it's a custom object type so adding some new attribute or w/e for it shouldn't cause any consternation to existing code, AFAIK. Maybe name it/them with a leading underscore for now just to hedge our bets re: API promises - the entire hostkey class set is old and creaky and due a replacement sometime.

I'm looking at adding _marker as an attribute to HostKeyEntry. I'm wondering if there should be a similar attribute on PKey, also? It looks like PKey relies on the convention that a certificate's filename will be conform to *-cert.pub in the from_path staticmethod. That's probably sufficient for 99.9% of uses, but the thing that jumps out to me about this is that PKey does care if it got a certificate so that it knows to call load_certificate. Such a call doesn't happen in PKey.from_type_string, which is the method called from HostKeyEntry.from_line. An attribute may not be strictly necessary; a simple boolean parameter for PKey.from_type_string would be sufficient to condition a call to load_certificate, if that's even appropriate/necessary here.

I think a host key lookup matching an explicitly revoked key is probably OK to be handled by returning None (as if there was a lookup-miss) but I'm curious what OpenSSH does in this case (that? or an error?) since we normally just default to what they do.

Looking at hostfile.c, their parsing function returns the marker and it's recorded in a struct field. In ssh-add.c, presence of an @revoked marker will cause a host key to be skipped in a function called parse_dest_constraint_hop, which is called by the ssh-agent tool. It's not totally clear to me how it all fits together in the context of connecting to a host, tbh, and I'm relying on Github's iffy code-nav tools to poke around--please add grains of salt 馃槂

But, if their hostkey_foreach_line struct could be considered analogous in function to the PKey class here, it might support the idea to add a marker attribute to PKey?

@MajorDallas
Copy link
Author

MajorDallas commented Aug 28, 2023

One other small thing: I'm also looking at adding type annotations for HostKeyEntry.from_line. The lineno param defaults to None, so with no semantic changes the proper type should be Optional[int]. However, that method currently has exactly one caller within the library (if Pyright is to be believed), and the callsite basically guarantees that lineno will always be an int by virtue of being assigned by enumerate. Within the function, lineno is only used for the log message for a key with < 3 fields.

Assuming this method isn't supposed to be part of the public API, I'd recommend removing the default to None.

@bskinn
Copy link
Contributor

bskinn commented Oct 17, 2023

Assuming this method isn't supposed to be part of the public API ...

Even if it's not supposed to be part of the public API, there's a nonzero chance that this function has use in the wild, and changing this behavior would be considered a regression.

Given that this function could be used as an independent test of proper known_hosts line formatting, in which case the lineno is irrelevant, IMO this makes the nonzero chance even more nonzero.

... I'd recommend removing the default to None.

Indeed, as well as tightening the type to int, instead of Optional[int], if we were to go this direction.

However, as long as a None case is gracefully handled inside the function, I think revising the type to Optional[int] is the better course.

@MajorDallas
Copy link
Author

MajorDallas commented Dec 16, 2023

The last couple of months have been crazy busy with work, but I'm finally ready to take another look at this.

Two questions:

  1. The main thing I'm not sure about for finishing the implementation is if there needs to be any explicit call to PKey.load_certificate after HostKeyEntry.from_line in HostKeyEntry.load. I don't think so: all load does is populate a list on the instance, and I assume that list then gets processed appropriately elsewhere..? Still trying to read backwards, but any pointers are appreciated. Looks like my assumption was correct, with hostkeys ultimately processed in SSHClient.connect. It looks like server keys and known keys are processed with a simple equality check, which may not be enough for a certificate? As I noted back in August, I'm not sure I'm up for writing the cert verification myself unless there's some reusable code tucked away that I can call into.
  2. Confession: I've made most of my git mistakes on small repos with few contributors, and always with merge rather than rebase. Since I've been away a while, master has pulled ahead of my fork. I've sync'd that branch, but now should I rebase 771-hostkeys again onto the tip of master or just leave it where it is? (I figure this is one of those times I'll look less-bad for asking the "dumb" question than guessing and doing it wrong 馃檪 )

@mrmeszaros
Copy link

!IMPORTANT I am not the maintainer of this repository, but based on what I saw in the git history:

Short answer:

You should rebase.

Longer answer:

At least You should try - if You do not run into merge conflicts then it should be all good.
If You run into some, maybe abort, make a backup branch (in case the rebase breaks it beyond all hope).
Then you can try to resolve, and then check if your unit-tests run correctly.

Longest answer:

@MajorDallas I would be glad to help - as I see this is the currently worked / active PR.

PS.: I have an open PR I would like to get reviewed, so I hope the sooner this one is resolved, the sooner mine might :)

* Adds some type annotations
* Includes _marker in the output of `to_line`
* Sets _marker when constructing `HostKeyEntry` in `from_line`
@MajorDallas MajorDallas marked this pull request as ready for review January 1, 2024 20:22
@MajorDallas
Copy link
Author

So I attempted a rebase in a back-up copy and it worked just fine... but there are only two commits in this branch, so I'm not sure it even makes a difference? I know (basically) how rebase is different from merge, but not sure why, in this case, it should be one or the other.

@mrmeszaros
Copy link

Cool, good to know it was not a big deal :)

As far as I know, most projects prefer to have a linear history, because it is much easier to the eye and to argue about.

While one merge commit is not an issue, in the long run merge commits can introduce a lot of parallel branches with a git history that is hard to see through.

Lastly, a merge commit is usually dud - usually it does not convey much information, if not looked after these empty commits can take a big portion of the history.

I hope I was able to provide help with my experiences :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Host-Keys signed by a CA aren't recognized as known hosts
4 participants