Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to use to_text instead of to_html to FullSanitizer #154

Open
Earlopain opened this issue May 8, 2023 · 13 comments · May be fixed by #157
Open

Add an option to use to_text instead of to_html to FullSanitizer #154

Earlopain opened this issue May 8, 2023 · 13 comments · May be fixed by #157

Comments

@Earlopain
Copy link

Rails has the strip_tags method, which is great. Howerever for input like <p>a</p><p>b</p> the output is ab which doesn't follow what is displayed in the browser.

Loofah has the function to_text which handles newlines like this. I would like to introduce an option like preserve_whitespace (or similar) to the FullSanitizer to do just that.

Is that something you would consider adding?

@flavorjones
Copy link
Member

@Earlopain this seems like a good idea to me, and I'd consider merging something like this. Do you want to try to implement this in a pull request? I'd be happy to support and mentor you if that's something that would be helpful!

@Earlopain
Copy link
Author

Earlopain commented May 11, 2023

@flavorjones I'd like to give it a shot. Thank you for the mentoring offer, I really appreciate that. I'll see how far I get and if I get stuck anywhere I'll get back to you on that.

I should have time for this at the end of the week by the latest.

@flavorjones
Copy link
Member

OK, I'm going to step back and ask whether your use case could be addressed by just using Loofah's built-in scrubber:

      test "ActionView::Helpers::SanitizeHelper#sanitize with a Loofah builtin scrubber" do
        input = %(<div>hello</div><p>world</p>)

        assert_equal(%(\nhello\n\nworld\n), @subject.sanitize(input, scrubber: :newline_block_elements))
      end

That test passes today!

The symbol :newline_block_elements is a shortcut for Loofah::Scrubbers::NewlineBlockElements which is what #to_text uses under the hood.

I'm also curious why you're not just using Loofah directly?

@Earlopain
Copy link
Author

Thanks for the reply. I'm mostly looking to improve the functionality in the rails strip_tags method, which I guess would come one step after this. I came here since rails delegates that functionality to this gem. Thinking about it, I could try to get it into rails itself but I feel that wouldn't be the correct place since the sanitization logic is all here.

I occasionally find myself wanting to scrub html, loosing whitespace was never what I wanted. I know I can use Loofah (though I wasn't aware of the scrubber syntax) but the native rails method just doesn't work for me. I think it would be great if rails could support this behaviour directly.

@flavorjones
Copy link
Member

flavorjones commented May 17, 2023

I occasionally find myself wanting to scrub html, loosing whitespace was never what I wanted

This is the use case I'm curious to know more about. What kind of view is it, what kind of user agent is rendering the output?

I think understanding the use case will be good to make the case upstream to change or add features in SanitizeHelper.

@Earlopain
Copy link
Author

Of course, I'm hoping I'm not getting to long-winded here:

Consider a page where users can write text in the form of markdown, like in Github Issues. There is a bunch of decorative text like the issue title or who wrote the issue, but the only relevant part is what the user provided, like this for example:

# Cool title

Very descriptive text

In html terms this would mean something like this:

<h1 id="cool-title">Cool title</h1><p>Very descriptive text</p>

Usually search engines are pretty good about what text they include in their search results but sometimes the text from outside gets included. Something like User X opened this issue last week Add an option to use.... The solution is to provide <meta description="XYZ"> to nudge search engines to what's important but since the source text is markdown I can't embed that directly. I also can't use strip_text since that results in Cool titleVery descriptive text which is missing a space.

Another use case is for web scraping where you consume and store arbitrary HTML from websites. I'd like to display that to users without embeding untrusted input while at least preserving the general structure of the text without making it unreadable.

I realize that these things are probably very specific to me, I Just thought a few others might find a quick solution useful as well. I'm perfectly fine with just using Loofah myself.

@flavorjones
Copy link
Member

@Earlopain Thank you for this context! Very helpful, I get it now.

Let me think about this a bit. I think you're making a good case that the existing behavior of strip_tags is just ... not useful. I'm thinking about how to approach a conversation with the Rails committers about that.

@Earlopain
Copy link
Author

Hi there @flavorjones, did you have a chance to give this some more thought?

@flavorjones
Copy link
Member

@Earlopain Thanks for the ping. This is still on my to-do list. So many Loofah and RHS changes have gone into Rails 7.1 that I've been kicking this can down the road until after 7.1 ships (which should be pretty soon). I've also learned a bit more about how changes like this can be framed/proposed, and I have an idea of how I want to drive that conversation.

Thanks for your patience.

@Earlopain
Copy link
Author

Waiting for after Rails 7.1 is totally fair. Thanks for letting me know

@ilyazub
Copy link

ilyazub commented Jan 17, 2024

@flavorjones Thanks for sharing the scrubber: :newline_block_elements option in #154 (comment). This is super helpful!

@flavorjones
Copy link
Member

Tagging this on the Rails Conf 2024 hack day project board. If someone wants to think/talk about this that would be great!

@jhottenstein
Copy link

I'm looking at writing some tests/PR in actionview for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants