Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub tasks don't clean html tags #151

Closed
kirylrb opened this issue Jun 1, 2018 · 6 comments
Closed

Scrub tasks don't clean html tags #151

kirylrb opened this issue Jun 1, 2018 · 6 comments

Comments

@kirylrb
Copy link

kirylrb commented Jun 1, 2018

Hello everybody,
There is some simple cases below with all HTML sanitizing tasks I found in docs.
Please make me clear about why I can't getting rid of html tags.
Env: Ruby 2.5.1, Loofah 2.0.3, Rails 5.2

irb(main):032:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:strip).to_s
=> "<p>Test</p><p>Te3st \r</p>"
irb(main):033:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:prune).to_s
=> "<p>Test</p><p>Te3st \r</p>"
irb(main):034:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:escape).to_s
=> "<p>Test</p><p>Te3st \r</p>"
irb(main):035:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:whitewash).to_s
=> "<p>Test</p><p>Te3st \r</p>"
@kirylrb kirylrb changed the title Scrub don't clean html tags. Scrub tasks don't clean html tags Jun 1, 2018
@flavorjones
Copy link
Owner

@kirylpl I believe you want to use to_text and not to_s. Please let me know if there's something we can do to clarify this behavior in the README or other documentation?

@kirylrb
Copy link
Author

kirylrb commented Jun 4, 2018

flavorjones I tried with to_text method but still have no expected result, unfortunately - mean &#13; characters leaved in the result.
Could you show how to strip example like this? It might be be perfect addition to readme.

irb(main):009:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:strip).to_text
=> "\nTest\n\nTe3st &#13;\n"
irb(main):010:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:prune).to_text
=> "\nTest\n\nTe3st &#13;\n"
irb(main):011:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:escape).to_text
=> "\nTest\n\nTe3st &#13;\n"
irb(main):012:0> Loofah.fragment('<p>Test</p><p>Te3st &#13;</p>').scrub!(:whitewash).to_text
=> "\nTest\n\nTe3st &#13;\n"

@flavorjones
Copy link
Owner

@kirylpl I'm confused, the output you pasted above has the HTML tags removed. I think you're asking about removing the string &#13;, which is an entity (not a tag). Can you help me understand what output, exactly, you're hoping to see?

@kirylrb
Copy link
Author

kirylrb commented Jun 4, 2018

@flavorjones Yes, I want to get rid of this &#13; tag as well as from html tags, kinda combination of to_text and not to_s results expected:
=> "\nTest\n\nTe3st \r\n"
OR even better if I will have new line only for closing tags like </p> which situated near openin tags like <p> .
=> "Test\nTe3st \r"

@flavorjones
Copy link
Owner

Sounds like you might want to try writing your own scrubber for this behavior! The README has some examples on how you might approach this task.

@kirylrb
Copy link
Author

kirylrb commented Jun 6, 2018

@flavorjones Adding .gsub(/(&#13;|\s)+/, "\n"), dealt it for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants