Closed
Description
I noticed that some HTML comment tags are not removed.
Here is an example, my_scrub
should remove all the comments.
Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
=> "<!DOCTYPE html>\n<!--[if IE 7]><!-- --><html></html>\n"
I check the code and think the problem is here:
https://github.com/flavorjones/loofah/blob/master/lib/loofah/instance_methods.rb#L41
case self
when Nokogiri::XML::Document
scrubber.traverse(root) if root
when Nokogiri::XML::DocumentFragment
children.scrub! scrubber
else
scrubber.traverse(self)
end
So even a HTML::Document
would went through scrubber.traverse(root) if root
. So things outside of HTML will not went through this scrubber.
Metadata
Metadata
Assignees
Labels
No labels
Projects
Relationships
Development
No branches or pull requests
Activity
flavorjones commentedon Dec 3, 2014
Hi,
Thank you for reporting your issue. Can you please provide a working example that demonstrates this problem? In your example, you reference
my_scrub
but have not provided your implementation of that scrubber.-m
chengguangnan commentedon Dec 3, 2014
flavorjones commentedon Feb 11, 2018
Apologies for the atrociously long delay in responding. I understand what you're reporting, and acknowledge that the comments outside of the
html
tag are not being scrubbed.flavorjones commentedon Apr 5, 2020
Fix will be in v2.5.0.
Note: in the fix I've chosen to remove any comments from a
Loofah::HTML::Document
that exist outside thehtml
tag. I could have applied the scrubber to non-html
root nodes, but unfortunately these nodes (or rather, the document itself) don't meet the contract expected by the scrubber (for example, these comments can't be replaced by an arbitrary node type, or have a sibling node added).3 remaining items