Skip to content
trans edited this page Sep 13, 2010 · 2 revisions

Comparing HTMLFilter to Loofah

Facts

  • HTMLFilter is a Ruby port of lib_filter.php (v1.15) by Cal Henderson
  • It is pure Ruby with no dependencies
  • It also include a CSSFilter class for “sanitizing” stylesheets.
  • HTMLFilter is Regexp based

Usage

HTMLFilter’s initializer accepts a set of options to specify how it will sanitize HTML. By default it is very restrictive, as it seems that it was designed to sanitized blog comments. These are it’s default options:

    DEFAULT = {
      'allowed' => {
        'a'   => ['href', 'target'],
        'b'   => [],
        'i'   => [],
        'img' => ['src', 'width', 'height', 'alt']
      },
      'no_close' => ['img', 'br', 'hr'],
      'always_close' => ['a', 'b'],
      'protocol_attributes' => ['src', 'href'],
      'allowed_protocols' => ['http', 'ftp', 'mailto'],
      'remove_blanks' => ['a', 'b'],
      'strip_comments' => true,
      'always_make_tags' => true,
      'allow_numbered_entities' => true,
      'allowed_entities' => ['amp', 'gt', 'lt', 'quot']
    }

HTMLFilter has one method #filter to which the HTML document or fragment is passed.

  htmlfilter = HTMLFilter.new
  htmlfilter.filter(html)

Benchmarks

In benchmarks HTMLFilter is about 2-3x slower than Loofah in dealing with HTML documents and fragments, and about twice as fast in dealing with small text snippets.

  HeadToHeadHTMLFilter
        Large document, 98282 bytes (x100)
                                       total    single    rel
            Loofah::Helpers.sanitize  23.085 (0.230853)     -
                 HTMLFilter sanitize  55.526 (0.555257)  2.41x

      Small fragment, 3178 bytes (x1000)
                                       total    single    rel
            Loofah::Helpers.sanitize   7.066 (0.007066)     -
                 HTMLFilter sanitize  20.479 (0.020479)  2.90x

      Text snippet, 58 bytes (x10000)
                                       total    single    rel
            Loofah::Helpers.sanitize   5.671 (0.000567)     -
                 HTMLFilter sanitize   2.756 (0.000276)  0.49x
Clone this wiki locally