Combination of Deserialize and from_str_unchecked() is unsound #43

niklasf · 2021-01-17T21:11:36Z

While writing the safety comments for #40, I noticed that

            // Safety: The field is specified to be ASCII, and the only safe
            // way to construct VendorInfo is from real CPUID data or the
            // Default implementation.
            str::from_utf8_unchecked(slice)

is not actually true. Using Deserialize, types like VendorInfo can be constructed with arbitrary content in safe code, but calling str::from_utf8_unchecked() with invalid UTF-8 is undefined behavior.

Possible fixes are removing Deserialize, or validating at construction or right when constructing the str.

The text was updated successfully, but these errors were encountered:

gz · 2021-01-20T23:54:24Z

Yes, thanks for spotting this!

By validating at construction, you mean to do a custom implementation of Deserialize trait which throws an error in case it's not valid utf-8? That seems like a sensible solution to me.

niklasf · 2021-01-21T15:06:50Z

Yes, that should work. I wonder if it's even worth it, though. It appears that at least none of the published reverse dependencies are using serialize.

Edit: Not a soundness issue, but I'd also expect successfully deserialized data not to lead to panics. There's one assertion outside and some assertions in get_bits().

niklasf · 2021-01-24T20:47:10Z

Thanks for the quick turnaround on v9.0.0. I'll file a separate advisory for this issue (rustsec/advisory-db#671), that can be updated with the solution when available.

Shnatsel · 2021-01-24T22:07:55Z

I suggest using from_utf8_lossy instead of from_utf8_unchecked. This way only the chunks that are valid UTF-8 will be preserved.

This means the original input may be altered, and hence may not be entirely correct, but at least it will not lead to memory safety issues, and does not require changing the API.

gz · 2021-01-26T23:26:00Z

Unfortunately, from_utf8_lossy uses std::borrow::Cow which originates from the alloc crate. Ideally we can remain using cpuid with just core and not require dynamic heap allocation support. Otherwise, this would be a good way to go about it.

Shnatsel · 2021-01-26T23:39:07Z

bstr crate provides a string abstraction that's not required to be UTF-8, but that would add an extra dependency.

It's probably best to expose it as a &[u8] and let the API user deal with it, either via from_utf8_lossy or via bstr.

niklasf · 2021-01-27T00:24:01Z

For a non-breaking change (i.e. keeping -> &str) it would also be possible to cut off the slice at the first non-ASCII byte, similar to the nul byte here:

rust-cpuid/src/lib.rs

Lines 4149 to 4150 in 6993bb5

    
           // Brand terminated at nul byte or end, whichever comes first. 
        
           let slice = slice.split(|&x| x == 0).next().unwrap();

The less critical issue with the panics still remains.

If we're considering breaking changes, just removing Serialize/Deserialize is my top candidate. None of the published reverse dependencies are using it, and I find it hard to imagine a use case. When using CPUID locally, there's no need to persist it across program runs. When sending information to a server, a higher level representation seems more appropriate (e.g. sending the vendor, not the raw CPUID data that encodes it).

Then, if someone actually needs it and requests it, consider reimplementing it for those structs as needed, with careful validation.

gz · 2021-01-27T00:49:05Z

This is the pull request that originally added the feature with some explanation on why it was requested: #18

I agree with @niklasf, a higher level representation would've been more sensible. I can think of a use-case for myself where I would want to persist all CPUID output in order to save the current processor state e.g. before running an experiment. But I do think a better way than having Serialize/Deserialize on invididual data-structures might be to just save cpuid raw registers using a Vec/Map of CpuIdResult bytes and then persisting that...

I'll try to look into how hard it is to just return an error on Serialize/Deserialize if data isn't valid utf8. Otherwise removing Serialize/Deserialize is something I would consider too.

niklasf · 2021-01-27T01:02:28Z

Ah, I should have tried git blame :)

For the validation, a decent approach seems to be to use derive on a private helper struct, rather than implementing everything manually: serde-rs/serde#642 (comment).

gz · 2021-07-04T06:33:27Z

Alright I worked on cpuid today and pushed a more pragmatic fix for this: use from_utf8 combined with unwrap_or("Invalid string") instead of from_utf8_unchecked.
I figured this is fine because if the CPU isn't totally broken this will never happen for regular cpuid usage, and if someone conjures up some bad struct via Deserialize then they will hopefully realize it when they see this error string.

Voxelot · 2021-08-23T21:13:00Z

Would it be possible to get this backported to older versions of raw-cpuid? I'm stuck on version 8.1.2 since this is a transitive dependency in my case and don't have the option to make a major upgrade to 9.x.

gz · 2021-08-24T00:43:51Z

Hi @Voxelot yes, I can backport these changes.
Just out of curiosity, porting to 9.x or 10.x should be very minimal effort as likely nothing or only little has changed -- so maybe just replacing the version in Cargo.toml works already. If not, you can point me to the project in question and I can probably submit a pull request for the update.

gz closed this as completed in 936ab41 Jul 4, 2021

This was referenced Aug 22, 2021

RUSTSEC-2021-0089: Optional Deserialize implementations lacking validation coinmini/heim#4

Open

RUSTSEC-2021-0089: Optional Deserialize implementations lacking validation heim-rs/heim#344

Open

gz mentioned this issue Aug 24, 2021

Would it be possible to get this backported to older versions of raw-cpuid? I'm stuck on version 8.1.2 since this is a transitive dependency in my case and don't have the option to make a major upgrade to 9.x. #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combination of Deserialize and from_str_unchecked() is unsound #43

Combination of Deserialize and from_str_unchecked() is unsound #43

niklasf commented Jan 17, 2021

gz commented Jan 20, 2021

niklasf commented Jan 21, 2021 •

edited

niklasf commented Jan 24, 2021

Shnatsel commented Jan 24, 2021

gz commented Jan 26, 2021

Shnatsel commented Jan 26, 2021 •

edited

niklasf commented Jan 27, 2021 •

edited

gz commented Jan 27, 2021

niklasf commented Jan 27, 2021 •

edited

gz commented Jul 4, 2021

Voxelot commented Aug 23, 2021

gz commented Aug 24, 2021

Combination of Deserialize and from_str_unchecked() is unsound #43

Combination of Deserialize and from_str_unchecked() is unsound #43

Comments

niklasf commented Jan 17, 2021

gz commented Jan 20, 2021

niklasf commented Jan 21, 2021 • edited

niklasf commented Jan 24, 2021

Shnatsel commented Jan 24, 2021

gz commented Jan 26, 2021

Shnatsel commented Jan 26, 2021 • edited

niklasf commented Jan 27, 2021 • edited

gz commented Jan 27, 2021

niklasf commented Jan 27, 2021 • edited

gz commented Jul 4, 2021

Voxelot commented Aug 23, 2021

gz commented Aug 24, 2021

niklasf commented Jan 21, 2021 •

edited

Shnatsel commented Jan 26, 2021 •

edited

niklasf commented Jan 27, 2021 •

edited

niklasf commented Jan 27, 2021 •

edited