-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix UTF-8 unsoundness in string::merge #194
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -716,6 +716,44 @@ macro_rules! length_delimited { | |
pub mod string { | ||
use super::*; | ||
|
||
// String::as_mut_vec is unsafe because it doesn't check that the bytes | ||
// inserted into the resulting vec are valid UTF-8. We check | ||
// after fully extending the buffer in order to ensure this is safe. | ||
// If an unwind happens before the check is performed and the result | ||
// is committed, the string is truncated to the previously valid length. | ||
struct StringGuard<'a> { | ||
string: &'a mut String, | ||
committed_len: usize, | ||
} | ||
|
||
impl<'a> Drop for StringGuard<'a> { | ||
fn drop(&mut self) { | ||
self.string.truncate(self.committed_len); | ||
} | ||
} | ||
|
||
impl<'a> StringGuard<'a> { | ||
fn new(string: &'a mut String) -> Self { | ||
let committed_len = string.len(); | ||
StringGuard { | ||
string, | ||
committed_len, | ||
} | ||
} | ||
|
||
unsafe fn as_mut_vec(&mut self) -> &mut Vec<u8> { | ||
self.string.as_mut_vec() | ||
} | ||
|
||
fn commit(mut self) -> Result<(), DecodeError> { | ||
let new_bytes = &self.string.as_bytes()[self.committed_len..]; | ||
str::from_utf8(new_bytes) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you need to check the whole string, not just the new bytes here, since when the user has mutable access to the internal string, they can modify as well as append. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In a general purpose But again, clearing the string value before the new wire data are merged, and also on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Also in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's entirely possible that this is done by accident, and should be fixed. Do you have a reference for where the spec / PB docs mention that merging will overwrite strings? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the developer guide on encoding:
|
||
.map_err(|_| DecodeError::new("invalid string value: data is not UTF-8 encoded"))?; | ||
self.committed_len += new_bytes.len(); | ||
Ok(()) | ||
} | ||
} | ||
|
||
pub fn encode<B>(tag: u32, value: &String, buf: &mut B) | ||
where | ||
B: BufMut, | ||
|
@@ -728,14 +766,11 @@ pub mod string { | |
where | ||
B: Buf, | ||
{ | ||
let mut guard = StringGuard::new(value); | ||
unsafe { | ||
// String::as_mut_vec is unsafe because it doesn't check that the bytes | ||
// inserted into it the resulting vec are valid UTF-8. We check | ||
// explicitly in order to ensure this is safe. | ||
super::bytes::merge(wire_type, value.as_mut_vec(), buf)?; | ||
str::from_utf8(value.as_bytes()) | ||
.map_err(|_| DecodeError::new("invalid string value: data is not UTF-8 encoded"))?; | ||
super::bytes::merge(wire_type, guard.as_mut_vec(), buf)?; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think using some clever buffer swaps we can simplify this and remove the need for
That can even be simplified a bit more if we decide we need to overwrite on merge instead of append. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @danburkert I think your implementation is good, unless somebody cares about the existing string data being lost on merge failure. |
||
} | ||
guard.commit()?; | ||
Ok(()) | ||
} | ||
|
||
|
@@ -1307,6 +1342,15 @@ mod test { | |
} | ||
} | ||
|
||
#[test] | ||
fn string_merge_failure() { | ||
let mut s = String::new(); | ||
let mut buf = Cursor::new(b"\x80\x80"); | ||
let r = string::merge(WireType::LengthDelimited, &mut s, &mut buf); | ||
r.expect_err("must be an error"); | ||
assert!(s.is_empty()); | ||
} | ||
|
||
#[test] | ||
fn varint() { | ||
fn check(value: u64, encoded: &[u8]) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my (admittedly weak) understanding of the unsafe rules, it's instant UB to create a
str
/String
containing invalid UTF-8. Given that, I think this pattern is still unsafe, since it's appending bytes to aString
, then checking validity. Given my understanding, I think the only safe way to do what we are trying to do here is accumulate into aVec<u8>
, check if it's valid utf-8, and then and only then convert toString
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, it's not UB until the invalid data are accessed as
str
/String
; but, theoretically,.as_bytes()
or even.as_mut_vec()
might be in violation of this.