Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in behavior caused by Nokogiri 1.13.5 #130

Closed
CarlosCD opened this issue May 20, 2022 · 2 comments
Closed

Change in behavior caused by Nokogiri 1.13.5 #130

CarlosCD opened this issue May 20, 2022 · 2 comments

Comments

@CarlosCD
Copy link

I was trying to strip tags, and anything that could be used to build an HTML tags from data entered in a Rails application, and had a few tests related to this... It seems that changes in Nokogiri 1.13.5 have modified the Rails::Html::FullSanitizer behavior (probably a change in libxml2 to 1.9.13 - see Loofah issue below).

The difference seems to be whether some blank characters are used inside <>, close to the brackets. This changes the behavior, from removing the inside of <...> to escape the '<' and '>' characters, keeping what is inside.

Before Nokogiri 1.13.5 (this is 1.13.4):

s1 =  'Hello <world!>'
ActionView::Base.full_sanitizer.sanitize(s1)
# => "Hello "

s2 = 'Hello <... world!>'
ActionView::Base.full_sanitizer.sanitize(s2)
# => "Hello " 

s3 =  'Kitty is <-NOT-> bad!'
ActionView::Base.full_sanitizer.sanitize(s3)
# => "Kitty is  bad!" 

Nokogiri 1.13.5:

s1 =  'Hello <world!>'
ActionView::Base.full_sanitizer.sanitize(s1)
# => "Hello "

s2 = 'Hello <... world!>'
ActionView::Base.full_sanitizer.sanitize(s2)
# => "Hello &lt;... world!&gt;"

s3 =  'Kitty is <-NOT-> bad!'
ActionView::Base.full_sanitizer.sanitize(s3)
# => "Kitty is &lt;-NOT-&gt; bad!"

This issue in Loofah is probably the same: flavorjones/loofah#230 (closed).

I am not sure if this is a problem for some folks. In our case we wanted to remove any HTML that later could be used to carefully build a XSS problem, so it is not a big deal, but surprising.

@flavorjones
Copy link
Member

Hi, @CarlosCD, and thanks for opening this issue.

You're right, the update to libxml 2.9.14 did change this behavior, and because the whole stack of Nokogiri → Loofah → Rails::Html::Sanitizer wraps libxml2, there's nothing we can easily do to change this behavior.

I updated the RHS tests in #129, if you want to take a look and validate your mental model. The corresponding test changes in loofah are at flavorjones/loofah#235.

It's also worth noting that this change to how libxml2 fixes broken markup is actually part of a broader effort to remove denial-of-service attack vectors in libxml2. The loofah issue you link to above is related to an earlier attempt to do this in lbxml 2.9.13 which we essentially reverted in Nokogiri's vendored libxml2 for a time. The method used in libxml 2.9.14 is better and changes how broken markup is parsed in a less-surprising way. Please know that untrusted markup is still being sanitized, it's just that broken markup is being parsed in a slightly different way, and so you may get slightly different output in these edge cases.

Anyway, I hope this explanation was helpful. I'm going to close this issue, but please feel free to ask followup questions!

@CarlosCD
Copy link
Author

Thanks @flavorjones, it is indeed very helpful. Thank you so much for writing back and pointing to those tests, and for your work in this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants