You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to strip tags, and anything that could be used to build an HTML tags from data entered in a Rails application, and had a few tests related to this... It seems that changes in Nokogiri 1.13.5 have modified the Rails::Html::FullSanitizer behavior (probably a change in libxml2 to 1.9.13 - see Loofah issue below).
The difference seems to be whether some blank characters are used inside <>, close to the brackets. This changes the behavior, from removing the inside of <...> to escape the '<' and '>' characters, keeping what is inside.
Before Nokogiri 1.13.5 (this is 1.13.4):
s1='Hello <world!>'ActionView::Base.full_sanitizer.sanitize(s1)# => "Hello "s2='Hello <... world!>'ActionView::Base.full_sanitizer.sanitize(s2)# => "Hello " s3='Kitty is <-NOT-> bad!'ActionView::Base.full_sanitizer.sanitize(s3)# => "Kitty is bad!"
Nokogiri 1.13.5:
s1='Hello <world!>'ActionView::Base.full_sanitizer.sanitize(s1)# => "Hello "s2='Hello <... world!>'ActionView::Base.full_sanitizer.sanitize(s2)# => "Hello <... world!>"s3='Kitty is <-NOT-> bad!'ActionView::Base.full_sanitizer.sanitize(s3)# => "Kitty is <-NOT-> bad!"
I am not sure if this is a problem for some folks. In our case we wanted to remove any HTML that later could be used to carefully build a XSS problem, so it is not a big deal, but surprising.
The text was updated successfully, but these errors were encountered:
You're right, the update to libxml 2.9.14 did change this behavior, and because the whole stack of Nokogiri → Loofah → Rails::Html::Sanitizer wraps libxml2, there's nothing we can easily do to change this behavior.
I updated the RHS tests in #129, if you want to take a look and validate your mental model. The corresponding test changes in loofah are at flavorjones/loofah#235.
It's also worth noting that this change to how libxml2 fixes broken markup is actually part of a broader effort to remove denial-of-service attack vectors in libxml2. The loofah issue you link to above is related to an earlier attempt to do this in lbxml 2.9.13 which we essentially reverted in Nokogiri's vendored libxml2 for a time. The method used in libxml 2.9.14 is better and changes how broken markup is parsed in a less-surprising way. Please know that untrusted markup is still being sanitized, it's just that broken markup is being parsed in a slightly different way, and so you may get slightly different output in these edge cases.
Anyway, I hope this explanation was helpful. I'm going to close this issue, but please feel free to ask followup questions!
I was trying to strip tags, and anything that could be used to build an HTML tags from data entered in a Rails application, and had a few tests related to this... It seems that changes in Nokogiri 1.13.5 have modified the Rails::Html::FullSanitizer behavior (probably a change in libxml2 to 1.9.13 - see Loofah issue below).
The difference seems to be whether some blank characters are used inside <>, close to the brackets. This changes the behavior, from removing the inside of <...> to escape the '<' and '>' characters, keeping what is inside.
Before Nokogiri 1.13.5 (this is 1.13.4):
Nokogiri 1.13.5:
This issue in Loofah is probably the same: flavorjones/loofah#230 (closed).
I am not sure if this is a problem for some folks. In our case we wanted to remove any HTML that later could be used to carefully build a XSS problem, so it is not a big deal, but surprising.
The text was updated successfully, but these errors were encountered: