Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitization of character entities are replacing for blank spaces #96

Open
heltonrlustosa opened this issue Nov 21, 2019 · 4 comments
Open

Comments

@heltonrlustosa
Copy link

heltonrlustosa commented Nov 21, 2019

Hey.
We are using bluemonday library in a new project and in some cases i need to save the string with characthers entities(&nbsp, &lt, &gt...). But, after sanitize some exemples we realise that the output don't have a non-breaking space enitity, for exemple.

Code exemple:

package main

import (
	"fmt"
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.UGCPolicy()

	p.AllowStyling()
	p.AllowAttrs("style").Globally()
	p.AllowStandardAttributes()

	result := p.Sanitize(
		`<p>I am normal</p>&nbsp<p style="color:red;">After space</p>&nbsp<p style="font-size:50px;">I am big</p>`,
	)

	fmt.Println(result)
	// Output:
	// <p>I am normal</p> <p style="color:red;">After space</p> <p style="font-size:50px;">I am big</p>
}

Do I forgot to add any policy?

Thank you.

@buro9
Copy link
Member

buro9 commented Nov 22, 2019

I do not understand the scenario.

Are you saying that a blank space between paragraphs should be converted to a non-breaking space character?

And I do not see in the example anything that demonstrates an issue with &lt; and &gt;.

If all you seek to do is fully escape a string for presentation as HTML then does https://golang.org/pkg/html/#EscapeString not do this?

@heltonrlustosa
Copy link
Author

Sorry, I sent you an incorrect example. My problem is with a escaped string that contains "&nbsp", in that case sanitize is removing then.

I will edit the description and put a correct exemple.

Thank you.

@buro9
Copy link
Member

buro9 commented Nov 25, 2019

I think... that it's fine, but that the console and text things display it weird.

Nothing in my code explicitly touches a &nbsp; and I see the net/html package escapes &nbsp; as \u00a0 (unicode non-break space).

I've looked at the output of the example you've provided and initially it looks like they are converted to whitespace. But look closer, put the output into a good text editor and look at the whitespace (or select all whitespace that matches that in I am) and you'll see that the &nbsp; isn't actually a space character. If you inspect it, you'll see it is a unicode non-break space.

So bluemonday is doing precisely what the net/html package believes is the best way to do this.

@Delicious-Bacon
Copy link

@heltonrlustosa you should use Golang's %q verb in fmt.Printf function if you wish to see &nbsp; and other "hidden" characters (runes).

fmt.Printf("%q\n", result)
// "<p>I am normal</p>\u00a0<p style=\"color:red;\">After space</p>\u00a0<p style=\"font-size:50px;\">I am big</p>"

\u00a0 == &nbsp;, therefore, bluemonday works as intended, and your implementation does what you wanted it to do.

Read more at fmt package: fmt package

You should close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants