Scrub tasks don't clean html tags #151

kirylrb · 2018-06-01T13:02:55Z

Hello everybody,
There is some simple cases below with all HTML sanitizing tasks I found in docs.
Please make me clear about why I can't getting rid of html tags.
Env: Ruby 2.5.1, Loofah 2.0.3, Rails 5.2

irb(main):032:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:strip).to_s
=> "<p>Test</p><p>Te3st \r</p>"
irb(main):033:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:prune).to_s
=> "<p>Test</p><p>Te3st \r</p>"
irb(main):034:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:escape).to_s
=> "<p>Test</p><p>Te3st \r</p>"
irb(main):035:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:whitewash).to_s
=> "<p>Test</p><p>Te3st \r</p>"

The text was updated successfully, but these errors were encountered:

flavorjones · 2018-06-02T19:11:59Z

@kirylpl I believe you want to use to_text and not to_s. Please let me know if there's something we can do to clarify this behavior in the README or other documentation?

kirylrb · 2018-06-04T08:28:44Z

flavorjones I tried with to_text method but still have no expected result, unfortunately - mean  characters leaved in the result.
Could you show how to strip example like this? It might be be perfect addition to readme.

irb(main):009:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:strip).to_text
=> "\nTest\n\nTe3st &#13;\n"
irb(main):010:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:prune).to_text
=> "\nTest\n\nTe3st &#13;\n"
irb(main):011:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:escape).to_text
=> "\nTest\n\nTe3st &#13;\n"
irb(main):012:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:whitewash).to_text
=> "\nTest\n\nTe3st &#13;\n"

flavorjones · 2018-06-04T13:30:38Z

@kirylpl I'm confused, the output you pasted above has the HTML tags removed. I think you're asking about removing the string , which is an entity (not a tag). Can you help me understand what output, exactly, you're hoping to see?

kirylrb · 2018-06-04T14:05:16Z

@flavorjones Yes, I want to get rid of this  tag as well as from html tags, kinda combination of to_text and not to_s results expected:
=> "\nTest\n\nTe3st \r\n"
OR even better if I will have new line only for closing tags like </p> which situated near openin tags like <p> .
=> "Test\nTe3st \r"

flavorjones · 2018-06-04T14:42:34Z

Sounds like you might want to try writing your own scrubber for this behavior! The README has some examples on how you might approach this task.

kirylrb · 2018-06-06T08:11:00Z

@flavorjones Adding .gsub(/(|\s)+/, "\n"), dealt it for me.

kirylrb changed the title ~~Scrub don't clean html tags.~~ Scrub tasks don't clean html tags Jun 1, 2018

flavorjones closed this as completed Jun 2, 2018

flavorjones added the user-help label Jun 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrub tasks don't clean html tags #151

Scrub tasks don't clean html tags #151

kirylrb commented Jun 1, 2018 •

edited

flavorjones commented Jun 2, 2018

kirylrb commented Jun 4, 2018 •

edited

flavorjones commented Jun 4, 2018

kirylrb commented Jun 4, 2018 •

edited

flavorjones commented Jun 4, 2018

kirylrb commented Jun 6, 2018

Scrub tasks don't clean html tags #151

Scrub tasks don't clean html tags #151

Comments

kirylrb commented Jun 1, 2018 • edited

flavorjones commented Jun 2, 2018

kirylrb commented Jun 4, 2018 • edited

flavorjones commented Jun 4, 2018

kirylrb commented Jun 4, 2018 • edited

flavorjones commented Jun 4, 2018

kirylrb commented Jun 6, 2018

kirylrb commented Jun 1, 2018 •

edited

kirylrb commented Jun 4, 2018 •

edited

kirylrb commented Jun 4, 2018 •

edited