Skip to content

Commit

Permalink
Speed up HTML fuzzer
Browse files Browse the repository at this point in the history
htmlDocDumpMemory uses the "HTML" encoding if no other encoding was
specified in the source HTML. This encoding can be extremely slow
because of an inefficiency in htmlEntityValueLookup. Stop encoding
the output for now.
  • Loading branch information
nwellnhof committed Feb 7, 2021
1 parent e6495e4 commit ec808a4
Showing 1 changed file with 11 additions and 4 deletions.
15 changes: 11 additions & 4 deletions fuzz/html.c
Expand Up @@ -22,7 +22,7 @@ LLVMFuzzerTestOneInput(const char *data, size_t size) {
static const size_t maxChunkSize = 128;
htmlDocPtr doc;
htmlParserCtxtPtr ctxt;
xmlChar *out;
xmlOutputBufferPtr out;
const char *docBuffer;
size_t docSize, consumed, chunkSize;
int opts, outSize;
Expand All @@ -39,9 +39,16 @@ LLVMFuzzerTestOneInput(const char *data, size_t size) {
/* Pull parser */

doc = htmlReadMemory(docBuffer, docSize, NULL, NULL, opts);
/* Also test the serializer. */
htmlDocDumpMemory(doc, &out, &outSize);
xmlFree(out);

/*
* Also test the serializer. Call htmlDocContentDumpOutput with our
* own buffer to avoid encoding the output. The HTML encoding is
* excruciatingly slow (see htmlEntityValueLookup).
*/
out = xmlAllocOutputBuffer(NULL);
htmlDocContentDumpOutput(out, doc, NULL);
xmlOutputBufferClose(out);

xmlFreeDoc(doc);

/* Push parser */
Expand Down

0 comments on commit ec808a4

Please sign in to comment.