Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess Deny of Service risk and potentially limit the buffer size #22

Open
ririsoft opened this issue Jul 20, 2020 · 2 comments
Open

Assess Deny of Service risk and potentially limit the buffer size #22

ririsoft opened this issue Jul 20, 2020 · 2 comments

Comments

@ririsoft
Copy link
Contributor

Initially discussed in #21, stripping of data (such white spaces) at the beginning of input buffer can lead to a Denial Of Service for inputs with a very large amount of data to be stripped before reaching the content to be actually sniffed.

Shall we:

  1. Ignore this issue because we believe there is no such risk.
  2. Let the caller be responsible for this. In such case we should comment this at crate level.
  3. Restrict the size of the buffer globally?
  4. Restrict the size of the buffer for matchers expose to this risk?
@ririsoft ririsoft mentioned this issue Jul 20, 2020
@paolobarbolini
Copy link
Collaborator

paolobarbolini commented Jul 22, 2020

I looked into it and most matchers don't seem to be affected by the size of the input buffer.

Some of them are a bit more sophisticated, like

image

/// Returns whether a buffer is HEIF image data.
pub fn is_heif(buf: &[u8]) -> bool {
if buf.is_empty() {
return false;
}
if !is_isobmff(buf) {
return false;
}
if let Some((major, _minor, compatible)) = get_ftyp(buf) {
if major == b"heic" {
return true;
}
if major == b"mif1" || major == b"msf1" {
for b in compatible {
if b == b"heic" {
return true;
}
}
}
}
false
}
/// Returns whether a buffer is AVIF image data.
pub fn is_avif(buf: &[u8]) -> bool {
if buf.is_empty() {
return false;
}
if !is_isobmff(buf) {
return false;
}
if let Some((major, _minor, compatible)) = get_ftyp(buf) {
if major == b"avif" || major == b"avis" {
return true;
}
for b in compatible {
if b == b"avif" || b == b"avis" {
return true;
}
}
}
false
}
// IsISOBMFF checks whether the given buffer represents ISO Base Media File Format data
fn is_isobmff(buf: &[u8]) -> bool {
if buf.len() < 16 {
return false;
}
if &buf[4..8] != b"ftyp" {
return false;
}
let ftyp_length = u32::from_be_bytes(buf[0..4].try_into().unwrap()) as usize;
buf.len() >= ftyp_length
}
// GetFtyp returns the major brand, minor version and compatible brands of the ISO-BMFF data
fn get_ftyp(buf: &[u8]) -> Option<(&[u8], &[u8], impl Iterator<Item = &[u8]>)> {
if buf.len() < 16 {
return None;
}
let ftyp_length = u32::from_be_bytes(buf[0..4].try_into().unwrap()) as usize;
let major = &buf[8..12];
let minor = &buf[12..16];
let compatible = (16..ftyp_length).step_by(4).map(move |i| &buf[i..i + 4]);
Some((major, minor, compatible))
}

here at line 153 a malicious user could set a very big number (they would still be limited by a u32::MAX) and make us iterate over the entire buffer. If it's necessary we could put a limit to the number of iterations.

doc

infer/src/matchers/doc.rs

Lines 69 to 175 in f4635f8

/// Returns whether a buffer is Microsoft PowerPoint Open XML Presentation (PPTX) data.
pub fn is_pptx(buf: &[u8]) -> bool {
match msooxml(buf) {
Some(typ) => typ == DocType::PPTX,
None => false,
}
}
fn msooxml(buf: &[u8]) -> Option<DocType> {
let signature = [b'P', b'K', 0x03, 0x04];
// start by checking for ZIP local file header signature
if !compare_bytes(buf, &signature, 0) {
return None;
}
let v = check_msooml(buf, 0x1E);
if v.is_some() {
return v;
}
if !compare_bytes(buf, b"[Content_Types].xml", 0x1E)
&& !compare_bytes(buf, b"_rels/.rels", 0x1E)
{
return None;
}
// skip to the second local file header
// since some documents include a 520-byte extra field following the file
// header, we need to scan for the next header
let mut start_offset = (u32::from_le_bytes(buf[18..22].try_into().unwrap()) + 49) as usize;
let idx = search(buf, start_offset, 6000)?;
// now skip to the *third* local file header; again, we need to scan due to a
// 520-byte extra field following the file header
start_offset += idx + 4 + 26;
let idx = search(buf, start_offset, 6000)?;
// and check the subdirectory name to determine which type of OOXML
// file we have. Correct the mimetype with the registered ones:
// http://technet.microsoft.com/en-us/library/cc179224.aspx
start_offset += idx + 4 + 26;
check_msooml(buf, start_offset)?;
// OpenOffice/Libreoffice orders ZIP entry differently, so check the 4th file
start_offset += 26;
let idx = search(buf, start_offset, 6000);
match idx {
Some(idx) => start_offset += idx + 4 + 26,
None => return Some(DocType::OOXLM),
};
let typo = check_msooml(buf, start_offset);
if typo.is_some() {
return typo;
}
Some(DocType::OOXLM)
}
fn compare_bytes(slice: &[u8], sub_slice: &[u8], start_offset: usize) -> bool {
let sl = sub_slice.len();
if start_offset + sl > slice.len() {
return false;
}
for (i, v) in slice.iter().skip(start_offset).take(sl).enumerate() {
let v2 = sub_slice[i];
if *v != v2 {
return false;
}
}
true
}
fn check_msooml(buf: &[u8], offset: usize) -> Option<DocType> {
if compare_bytes(buf, &[b'w', b'o', b'r', b'd', b'/'], offset) {
Some(DocType::DOCX)
} else if compare_bytes(buf, &[b'p', b'p', b't', b'/'], offset) {
Some(DocType::PPTX)
} else if compare_bytes(buf, &[b'x', b'l', b'/'], offset) {
Some(DocType::XLSX)
} else {
None
}
}
fn search(buf: &[u8], start: usize, range: usize) -> Option<usize> {
let length = buf.len();
let mut end = start + range;
let signature: &[_] = &[b'P', b'K', 0x03, 0x04];
if end > length {
end = length;
}
if start >= end {
return None;
}
buf[start..end]
.windows(signature.len())
.position(|window| window == signature)
}

Here it's doing a lot of stuff to try to recognize what type of document it is, but it still feels under control. The range on the search fn is hard coded by us, everything else seems ok.

@ririsoft
Copy link
Contributor Author

Good catch for the image loop size 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants