RFC: limit libgumbo memory allocations for untrusted HTML5 content #2949

flavorjones · 2023-08-09T14:42:40Z

Summary

libxml2 has long had default limits on document size in order to prevent untrusted documents from creating an OOM condition and potentially using that as a denial-of-service attack vector. These limits can be removed for trusted documents by setting the HUGE parse option.

libgumbo does not have limits like this, and this issue is being created to discuss the need and possible implementations.

Background

This topic was first raised in #2941 where @stevecheckoway and I discussed the shape of the issue.

The text was updated successfully, but these errors were encountered:

dan42 · 2023-08-23T17:46:03Z

It's nice to have "sanity check" type of limits, but silently truncating stuff is not good. Very hard to debug. Please make it raise an error. Ideally have multiple safeties, so we raise on any of

input string > 10MB
output mem > 15MB
tree depth > 1000

(example arbitrary numbers)

flavorjones added topic/memory Segfaults, memory leaks, valgrind testing, etc. topic/rfc topic/gumbo Gumbo HTML5 parser labels Aug 9, 2023

flavorjones mentioned this issue Aug 9, 2023

HTML4::DocumentFragment4 truncates text in a <div> tag at about 10mb #2941

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: limit libgumbo memory allocations for untrusted HTML5 content #2949

RFC: limit libgumbo memory allocations for untrusted HTML5 content #2949

flavorjones commented Aug 9, 2023

dan42 commented Aug 23, 2023

RFC: limit libgumbo memory allocations for untrusted HTML5 content #2949

RFC: limit libgumbo memory allocations for untrusted HTML5 content #2949

Comments

flavorjones commented Aug 9, 2023

Summary

Background

dan42 commented Aug 23, 2023