Assess Deny of Service risk and potentially limit the buffer size #22

ririsoft · 2020-07-20T13:21:36Z

Initially discussed in #21, stripping of data (such white spaces) at the beginning of input buffer can lead to a Denial Of Service for inputs with a very large amount of data to be stripped before reaching the content to be actually sniffed.

Shall we:

Ignore this issue because we believe there is no such risk.
Let the caller be responsible for this. In such case we should comment this at crate level.
Restrict the size of the buffer globally?
Restrict the size of the buffer for matchers expose to this risk?

paolobarbolini · 2020-07-22T21:15:31Z

I looked into it and most matchers don't seem to be affected by the size of the input buffer.

Some of them are a bit more sophisticated, like

image

infer/src/matchers/image.rs

Lines 81 to 160 in f4635f8

    
           /// Returns whether a buffer is HEIF image data. 
        
           pub fn is_heif(buf: &[u8]) -> bool { 
        
               if buf.is_empty() { 
        
                   return false; 
        
               } 
        
               if !is_isobmff(buf) { 
        
                   return false; 
        
               } 
        
               if let Some((major, _minor, compatible)) = get_ftyp(buf) { 
        
                   if major == b"heic" { 
        
                       return true; 
        
                   } 
        
                   if major == b"mif1" || major == b"msf1" { 
        
                       for b in compatible { 
        
                           if b == b"heic" { 
        
                               return true; 
        
                           } 
        
                       } 
        
                   } 
        
               } 
        
               false 
        
           } 
        
           /// Returns whether a buffer is AVIF image data. 
        
           pub fn is_avif(buf: &[u8]) -> bool { 
        
               if buf.is_empty() { 
        
                   return false; 
        
               } 
        
               if !is_isobmff(buf) { 
        
                   return false; 
        
               } 
        
               if let Some((major, _minor, compatible)) = get_ftyp(buf) { 
        
                   if major == b"avif" || major == b"avis" { 
        
                       return true; 
        
                   } 
        
                   for b in compatible { 
        
                       if b == b"avif" || b == b"avis" { 
        
                           return true; 
        
                       } 
        
                   } 
        
               } 
        
               false 
        
           } 
        
           // IsISOBMFF checks whether the given buffer represents ISO Base Media File Format data 
        
           fn is_isobmff(buf: &[u8]) -> bool { 
        
               if buf.len() < 16 { 
        
                   return false; 
        
               } 
        
               if &buf[4..8] != b"ftyp" { 
        
                   return false; 
        
               } 
        
               let ftyp_length = u32::from_be_bytes(buf[0..4].try_into().unwrap()) as usize; 
        
               buf.len() >= ftyp_length 
        
           } 
        
           // GetFtyp returns the major brand, minor version and compatible brands of the ISO-BMFF data 
        
           fn get_ftyp(buf: &[u8]) -> Option<(&[u8], &[u8], impl Iterator<Item = &[u8]>)> { 
        
               if buf.len() < 16 { 
        
                   return None; 
        
               } 
        
               let ftyp_length = u32::from_be_bytes(buf[0..4].try_into().unwrap()) as usize; 
        
               let major = &buf[8..12]; 
        
               let minor = &buf[12..16]; 
        
               let compatible = (16..ftyp_length).step_by(4).map(move |i| &buf[i..i + 4]); 
        
               Some((major, minor, compatible)) 
        
           }

here at line 153 a malicious user could set a very big number (they would still be limited by a u32::MAX) and make us iterate over the entire buffer. If it's necessary we could put a limit to the number of iterations.

doc

infer/src/matchers/doc.rs

Lines 69 to 175 in f4635f8

    
           /// Returns whether a buffer is Microsoft PowerPoint Open XML Presentation (PPTX) data. 
        
           pub fn is_pptx(buf: &[u8]) -> bool { 
        
               match msooxml(buf) { 
        
                   Some(typ) => typ == DocType::PPTX, 
        
                   None => false, 
        
               } 
        
           } 
        
           fn msooxml(buf: &[u8]) -> Option<DocType> { 
        
               let signature = [b'P', b'K', 0x03, 0x04]; 
        
               // start by checking for ZIP local file header signature 
        
               if !compare_bytes(buf, &signature, 0) { 
        
                   return None; 
        
               } 
        
               let v = check_msooml(buf, 0x1E); 
        
               if v.is_some() { 
        
                   return v; 
        
               } 
        
               if !compare_bytes(buf, b"[Content_Types].xml", 0x1E) 
        
                   && !compare_bytes(buf, b"_rels/.rels", 0x1E) 
        
               { 
        
                   return None; 
        
               } 
        
               // skip to the second local file header 
        
               // since some documents include a 520-byte extra field following the file 
        
               // header, we need to scan for the next header 
        
               let mut start_offset = (u32::from_le_bytes(buf[18..22].try_into().unwrap()) + 49) as usize; 
        
               let idx = search(buf, start_offset, 6000)?; 
        
               // now skip to the *third* local file header; again, we need to scan due to a 
        
               // 520-byte extra field following the file header 
        
               start_offset += idx + 4 + 26; 
        
               let idx = search(buf, start_offset, 6000)?; 
        
               // and check the subdirectory name to determine which type of OOXML 
        
               // file we have.  Correct the mimetype with the registered ones: 
        
               // http://technet.microsoft.com/en-us/library/cc179224.aspx 
        
               start_offset += idx + 4 + 26; 
        
               check_msooml(buf, start_offset)?; 
        
               // OpenOffice/Libreoffice orders ZIP entry differently, so check the 4th file 
        
               start_offset += 26; 
        
               let idx = search(buf, start_offset, 6000); 
        
               match idx { 
        
                   Some(idx) => start_offset += idx + 4 + 26, 
        
                   None => return Some(DocType::OOXLM), 
        
               }; 
        
               let typo = check_msooml(buf, start_offset); 
        
               if typo.is_some() { 
        
                   return typo; 
        
               } 
        
               Some(DocType::OOXLM) 
        
           } 
        
           fn compare_bytes(slice: &[u8], sub_slice: &[u8], start_offset: usize) -> bool { 
        
               let sl = sub_slice.len(); 
        
               if start_offset + sl > slice.len() { 
        
                   return false; 
        
               } 
        
               for (i, v) in slice.iter().skip(start_offset).take(sl).enumerate() { 
        
                   let v2 = sub_slice[i]; 
        
                   if *v != v2 { 
        
                       return false; 
        
                   } 
        
               } 
        
               true 
        
           } 
        
           fn check_msooml(buf: &[u8], offset: usize) -> Option<DocType> { 
        
               if compare_bytes(buf, &[b'w', b'o', b'r', b'd', b'/'], offset) { 
        
                   Some(DocType::DOCX) 
        
               } else if compare_bytes(buf, &[b'p', b'p', b't', b'/'], offset) { 
        
                   Some(DocType::PPTX) 
        
               } else if compare_bytes(buf, &[b'x', b'l', b'/'], offset) { 
        
                   Some(DocType::XLSX) 
        
               } else { 
        
                   None 
        
               } 
        
           } 
        
           fn search(buf: &[u8], start: usize, range: usize) -> Option<usize> { 
        
               let length = buf.len(); 
        
               let mut end = start + range; 
        
               let signature: &[_] = &[b'P', b'K', 0x03, 0x04]; 
        
               if end > length { 
        
                   end = length; 
        
               } 
        
               if start >= end { 
        
                   return None; 
        
               } 
        
               buf[start..end] 
        
                   .windows(signature.len()) 
        
                   .position(|window| window == signature) 
        
           }

Here it's doing a lot of stuff to try to recognize what type of document it is, but it still feels under control. The range on the search fn is hard coded by us, everything else seems ok.

ririsoft · 2020-07-23T08:40:41Z

Good catch for the image loop size 👍

ririsoft mentioned this issue Jul 20, 2020

Web types #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assess Deny of Service risk and potentially limit the buffer size #22

Assess Deny of Service risk and potentially limit the buffer size #22

ririsoft commented Jul 20, 2020

paolobarbolini commented Jul 22, 2020 •

edited

ririsoft commented Jul 23, 2020

Assess Deny of Service risk and potentially limit the buffer size #22

Assess Deny of Service risk and potentially limit the buffer size #22

Comments

ririsoft commented Jul 20, 2020

paolobarbolini commented Jul 22, 2020 • edited

image

doc

ririsoft commented Jul 23, 2020

paolobarbolini commented Jul 22, 2020 •

edited