Skip to content

Scrub not fully applied on HTML::Document #80

Closed
@chengguangnan

Description

@chengguangnan

I noticed that some HTML comment tags are not removed.

Here is an example, my_scrub should remove all the comments.

Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
=> "<!DOCTYPE html>\n<!--[if IE 7]><!-- --><html></html>\n"

I check the code and think the problem is here:

https://github.com/flavorjones/loofah/blob/master/lib/loofah/instance_methods.rb#L41

        case self
        when Nokogiri::XML::Document
          scrubber.traverse(root) if root
        when Nokogiri::XML::DocumentFragment
          children.scrub! scrubber
        else
          scrubber.traverse(self)
        end

So even a HTML::Document would went through scrubber.traverse(root) if root. So things outside of HTML will not went through this scrubber.

Activity

flavorjones

flavorjones commented on Dec 3, 2014

@flavorjones
Owner

Hi,

Thank you for reporting your issue. Can you please provide a working example that demonstrates this problem? In your example, you reference my_scrub but have not provided your implementation of that scrubber.

-m

chengguangnan

chengguangnan commented on Dec 3, 2014

@chengguangnan
Author
require "loofah"

class Scrubber < Loofah::Scrubber
  def scrub(node)
    if node.class == Nokogiri::XML::DTD or %w[ script style head comment ].include?(node.name)
      node.remove
      Loofah::Scrubber::STOP # don't bother with the rest of the subtree
    end
  end
end

my_scrub = Scrubber.new

puts Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
flavorjones

flavorjones commented on Feb 11, 2018

@flavorjones
Owner

Apologies for the atrociously long delay in responding. I understand what you're reporting, and acknowledge that the comments outside of the html tag are not being scrubbed.

flavorjones

flavorjones commented on Apr 5, 2020

@flavorjones
Owner

Fix will be in v2.5.0.

Note: in the fix I've chosen to remove any comments from a Loofah::HTML::Document that exist outside the html tag. I could have applied the scrubber to non-html root nodes, but unfortunately these nodes (or rather, the document itself) don't meet the contract expected by the scrubber (for example, these comments can't be replaced by an arbitrary node type, or have a sibling node added).

added this to the v2.5.0 milestone on Apr 5, 2020

3 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @flavorjones@chengguangnan

        Issue actions

          Scrub not fully applied on HTML::Document · Issue #80 · flavorjones/loofah