Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPM Provider does not persist generated keys accross reboot #504

Closed
hug-dev opened this issue Aug 25, 2021 · 9 comments
Closed

TPM Provider does not persist generated keys accross reboot #504

hug-dev opened this issue Aug 25, 2021 · 9 comments
Assignees
Labels
bug Something isn't working medium Effort label platforms Compatibility with different secure services or hardware platforms security Issues related to the security and privacy of the service stability Issues related to the stability of the service
Projects

Comments

@hug-dev
Copy link
Member

hug-dev commented Aug 25, 2021

It was observed, first on Slack that keys generated in the TPM provider can not be used after a cold reboot of the system. When trying to use the key generated before (for example export the public part) the following error is seen:

Aug 20 12:07:35 parsec[646]: ERROR:esys:../tpm2-tss-2.4.3/src/tss2-esys/api/Esys_ContextLoad.c:93:Esys_ContextLoad() Esys Finish ErrorCode (0x000001df)
Aug 20 12:07:35 parsec[646]: [ERROR tss_esapi::context::tpm_commands::context_management] Error in loading context: integrity check failed (associated with parameter number 1)

After investigation on the Slack thread linked above, on the TPM Dev chat (logs) and this post, the root cause was found.

Although the Storage Primary Seed stays the same accross TPM Reset (which happens when there is a cold system reboot), the key contexts created from the Storage Primary Key are invalidated.

This is for example written at section 30.4 of the Part 1 (Architecture) of the specs:

All saved object contexts are invalidated by TPM Reset.

Also in section 30.1:

Saved contexts for all objects and sessions are invalidated on a TPM Reset.

As said in the TPM Dev chat, a better solution to save persistent keys in to store the result of the tpm2_create operation which are the TPM2B_PRIVATE and TPM2B_PUBLIC structure. When the key needs to be used again, those structures can be loaded in the TPM with a tpm2_load. See here for what we currently do when creating a key.

This is a bug as it means that anyone having keys stored in the TPM provider would lose them after a reboot.
This is a stability concern as by changing the way we represent TPM keys in memory we would invalidate old format.

This isssue is made discuss solutions and start a fix!

@hug-dev hug-dev added bug Something isn't working security Issues related to the security and privacy of the service platforms Compatibility with different secure services or hardware platforms medium Effort label stability Issues related to the stability of the service labels Aug 25, 2021
@hug-dev hug-dev added this to All issues in Parsec via automation Aug 25, 2021
@hug-dev
Copy link
Member Author

hug-dev commented Aug 25, 2021

From @paulhowardarm:

Assuming that we can fix the TPM provider to persist the public/private key areas instead of the context, I can think of a couple of ways that we could integrate such a change into the service without breaking backwards-compatibility and without requiring a standalone migration script.

  1. We can extend the KIM so that each entry carries a version number. The on-disk KIM would store these in a parallel file tree of some kind, and the forthcoming SQL KIM would just store it as a column in the schema. Any entry for which version data is absent is implicitly assumed to be version 1. Versions would be simple incrementing integers, and the expectation would be that they only change rarely. We would add an API to the KIM to allow the version number of an entry to be queried before the actual KeyInfo is read. The client of the KIM (namely the provider) can then use the version number to choose between alternate serializable types when it calls KeyInfoManagerClient::get_key_id(). The provider can also, either lazily or eagerly, upgrade KIM entries and write them back in a newer, preferred format, calling some new KIM API to set the version number for future reference. A system like this in the KIM would also allow us to cope with future situations where we need to adjust how keys are serialized.
  2. We could take advantage of the fact that bincode::deserialize() will raise std::io::ErrorKind::UnexpectedEof if the target structure contains additional fields beyond what is available in the stream. The KIM could re-badge this as some meaningful error like MissingKeyFields or something. The KIM client (the provider) could define a new structure that embeds the old structure as its first member, but adds the additional data fields that it needs. It can attempt to read the new structure. If it gets the MissingKeyFields error, it can fall back on reading the old structure, but then upgrade it to the new one. This is hard to describe in words so see attached code for the idea. This is arguably a bit hacky and not as re-usable as (1), but probably a lot less work.
Example idea
use serde::{Deserialize, Serialize};
use bincode::{Error, ErrorKind};// Hack these typedefs to avoid actually pulling in any TPM stuff.
type TPMI_DH_CONTEXT = u32;
type TPMI_RH_HIERARCHY = u32;#[derive(Debug, Serialize, Deserialize, Clone)]
pub struct TpmsContext {
    sequence: u64,
    saved_handle: TPMI_DH_CONTEXT,
    hierarchy: TPMI_RH_HIERARCHY,
    context_blob: Vec<u8>,
}// This is what the TPM provider stores in the KIM today.
#[derive(Debug, Serialize, Deserialize)]
pub struct PasswordContext {
    pub context: TpmsContext,
    /// This value is confidential and needs to be zeroized by its new owner.
    pub auth_value: Vec<u8>,
}// Dummy struct to represent needing to store the key in a new way.
#[derive(Debug, Serialize, Deserialize)]
pub struct ExtraBits {
    private_key_area: Vec<u8>, // These would probably be some structured TPM type rather than
    public_key_area: Vec<u8>,  // just byte arrays.
}#[derive(Debug, Serialize, Deserialize)]
// This is what we could store instead. We probably would call it something
// else, but "UpgradedPasswordContext" is good to demonstrate the idea.
pub struct UpgradedPasswordContext {
    original: PasswordContext,
    extra_bits: ExtraBits,
}fn main() {
    // Fake up a context to serialize
    let tpmctx = TpmsContext {
        sequence: 42,
        saved_handle: 0xDEADBEEF,
        hierarchy: 0xBAADF00D,
        context_blob: vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    };// Make the original/existing form of context. This is what the TPM provider is serializing
    // in the KIM as of today.
    let pwdctx = PasswordContext {
        context: tpmctx,
        auth_value: vec![101, 102, 103, 104, 105],
    };// This represents serializing to disk (in the original PasswordContext form)
    let ondisk: Vec<u8> = bincode::serialize(&pwdctx).unwrap();// Try to read back as UpgradedPasswordContext. This is guaranteed to give us a
    // UnexpectedEof error, because UpgradedPasswordContext is a PasswordContext plus
    // additional fields.
    let upgraded = match bincode::deserialize::<UpgradedPasswordContext>(&ondisk) {
        Ok(_) => panic!("Did not expect this to succeed."),
        Err(e) => {
            match *e {
                ErrorKind::Io(io_error) => { 
                    if io_error.kind() == std::io::ErrorKind::UnexpectedEof {
                        // In this case, we expect the UnexpectedEof!
                        // So we need to deserialize the original form instead
                        // This time, just unwrap() because this should be correct.
                        let oldctx: PasswordContext = bincode::deserialize(&ondisk).unwrap();
                        
                        // This is where we would do TPM stuff to (hopefully) load the context
                        // and export the public/private areas. Just emulate the idea here.
                        let newctx = UpgradedPasswordContext {
                            original: oldctx,
                            extra_bits: ExtraBits {
                                private_key_area: vec![201, 202, 203, 204, 205, 206],
                                public_key_area: vec![211, 212, 213, 214, 215, 216],
                            }
                        };let newondisk = bincode::serialize(&newctx).unwrap();
​
                        newondisk
                    }
                    else {
                        panic!("An IO error occurred other than UnexpectedEof.")
                    }
                },
                _ => panic!("Error other than Io::EndOfFile occurred."),
            }
        },
    };// We should now be able to deserialize the upgraded data
    let upgrade_ctx: UpgradedPasswordContext = bincode::deserialize(&upgraded).unwrap();// Just dump to the console.
    println!("The upgraded context is {:#?}", &upgrade_ctx);
}

@hug-dev
Copy link
Member Author

hug-dev commented Aug 25, 2021

I like the version 1 you describe and is probably something we should have added from the start. This is especially true for the TPM provider where we don't only serialise an integer but big data structures that can change. This is something that was also suggested by @MattDavis00 in his design.

Concretely, we need to serialise the TPM2B_PUBLIC and the TPM2B_PRIVATE structures on disk. We can choose to either serialise the Rust or the C versions of those structure. The Rust version of TPM2B_PUBLIC is not currently available as it is being done here. For stability purposes, it might be better to use the C version as it is specified by the TSS specs.

@paulhowardarm
Copy link
Collaborator

Concretely, we need to serialise the TPM2B_PUBLIC and the TPM2B_PRIVATE structures on disk. We can choose to either serialise the Rust or the C versions of those structure. The Rust version of TPM2B_PUBLIC is not currently available as it is being done here. For stability purposes, it might be better to use the C version as it is specified by the TSS specs.

Is it known how we can recover these TPM2B_PUBLIC and TPM2B_PRIVATE structures for existing keys (as opposed to when creating new keys?. We would need to be able to do that for any sort of migration pathway to be feasible. Does the TPM give us the APIs we need for that, assuming that we can load the context successfully?

What about symmetric keys? I don't think the TPM provider supports those as yet, but I imagine it probably will at some point. Is that represented by just a TPM2B_PRIVATE structure, or is there a different structure for those?

@paulhowardarm
Copy link
Collaborator

We should probably begin by developing just the code changes needed to serialise the correct structures, without being concerned with upgrade or migration just yet. Possibly we should consider doing this on a separate branch so that stability and compatibility are preserved in the mainline. Once we have those changes committed and evaluated in their branch, we can decide what our approach to compatibility should be, and then implement that strategy in the branch before merging the entire fix to main.

@hug-dev
Copy link
Member Author

hug-dev commented Aug 26, 2021

Is it known how we can recover these TPM2B_PUBLIC and TPM2B_PRIVATE structures for existing keys (as opposed to when creating new keys?

I am not sure of all possible ways and @ionut-arm , @Superhepper and @wiktor-k might help too but from a loaded context handle and looking at the ESYS spec I see that you could:

  • call Esys_ReadPublic to get the TPM2B_PUBLIC struct
  • call Esys_ObjectChangeAuth (with the same auth value) to get the TPM2B_PRIVATE one

We seem to have a similar problem with imported keys (#505). If we wanted to fix that in the same time, we would also need to do the same there.

What about symmetric keys?

Esys_Create seems to be the only function to both create asymmetric and symmetric keys so I would say those same structures are generated as well.

@hug-dev
Copy link
Member Author

hug-dev commented Aug 26, 2021

To serialise the C structures, bindgen does automatically derive Serde traits on them, so this might be useful.

@Superhepper
Copy link
Contributor

I guess we are limited to the context methods we have implemented so far. But otherwise it could be possible to use Esys_Duplicate I think in order to backup the key some where else.

@ionut-arm
Copy link
Member

Concretely, we need to serialise the TPM2B_PUBLIC and the TPM2B_PRIVATE structures on disk. We can choose to either serialise the Rust or the C versions of those structure. The Rust version of TPM2B_PUBLIC is not currently available as it is being done here. For stability purposes, it might be better to use the C version as it is specified by the TSS specs.

The private part of the key is trivial to store because it's just a buffer with bytes (and we've already created native types for it). But I don't think we actually need to store the whole TPM2B_PUBLIC, because we already store the key attributes alongside that serialized data, and given those attributes we can reconstruct the rest of the fields (i.e. not the public key material) in TPM2B_PUBLIC. I'd suggest just having two buffers of bytes, one for the encrypted private part and one for the public part of the key. Then when loading it for use, we reconstruct the public structure (whether using a native solution or the FFI type). This ensures that what we store doesn't depend on what bindgen produces.

@ionut-arm
Copy link
Member

ionut-arm commented Sep 10, 2021

Ok, writing an update note just to keep track of what I've been working on and what issues I've hit.

Refactoring the TPM stack to work as I've highlighted above wasn't that difficult and most of the changes fell into the tss-esapi crate as internal changes to the TransientKeyContext. The create_key and import methods now gives out a different structure, KeyMaterial, that holds the public and private parts of the keys. All other methods take in this new structure along with the key parameters (an equivalent of Attributes) and loads the keys before using them (followed by a flushing of their contents).

Implementing the migration part (i.e. converting our old key context structures to KeyMaterial) was more painful. This is because some of the operations (namely, retrieving the private part of the old keys and loading those key pairs) require identical public parameters for both the primary key (which we use as a storage key for client keys) and for the client keys as well, across the migration. Ensuring this is the case (and identifying this as an issue in a number of bugs) was non-trivial.

A few problems persist and are next on my TODO list: some of the newer code bits in the tss-esapi crate do not compile under the Rust 1.43 compiler. We don't have any written test to check what happens over a TPM Reset.

I'll be updating this comment as work progresses. I'll also try to do a detailed description of the bugs I hit, as it is likely that in the future something like that could cause problems again.

@ionut-arm ionut-arm self-assigned this Sep 14, 2021
@hug-dev hug-dev closed this as completed Sep 27, 2021
Parsec automation moved this from All issues to Done Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working medium Effort label platforms Compatibility with different secure services or hardware platforms security Issues related to the security and privacy of the service stability Issues related to the stability of the service
Projects
Development

No branches or pull requests

4 participants