You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great component, loving it, but...
while using HtmlSanitizer, I noticed some poor, non-linear performance figures. I understand my data is not representative of HTML pages, but obviously when data has come from external systems, which may have not been guarded or sanitized, it might not be small. My fear is that the poor performance could expose this component to a denial of service attack. I understand this could all be AngleSharp's problem, but wondered if this component could mitigate/prevent these issues...
To test: I created HTML simply containing <br/> tags, and injected one or two bad <br> tags at position 196 and 1736. And in the extreme case 100 bad tags scattered randomly.
Here are the figures for performance.
HTML Size
Number of bad BRs
Time taken
Added Memory
800k
1
6s
20mb
800k
2
7.3s
20mb
8mb
1
612s
145mb
8mb
2
650s
145mb
8mb
100
633s
210mb
80mb
1
?s
1.6Gb
100mb
1
?s
2Gb
?=Didn't wait to find out, but longer than I can be bothered to wait.
Is this what you'd expect? All we do that's special is whitelist 60 tags, allow "face" attributes and disallow "src" attributes.
Breaking the debugger usually shows that all the work is being done in AngleSharp.Dom.Node.RemoveChild, but I haven't run a perfview to find out more.
If there are size, speed, memory limits to this module, can they be published?
At the moment, my own plan for mitigation is that once any embeded img data tags are stripped, if the size of the HTML is over 1mb, I won't bother sanitizing it, and may reject it.
The text was updated successfully, but these errors were encountered:
Great component, loving it, but...
while using HtmlSanitizer, I noticed some poor, non-linear performance figures. I understand my data is not representative of HTML pages, but obviously when data has come from external systems, which may have not been guarded or sanitized, it might not be small. My fear is that the poor performance could expose this component to a denial of service attack. I understand this could all be AngleSharp's problem, but wondered if this component could mitigate/prevent these issues...
To test: I created HTML simply containing
<br/>
tags, and injected one or two bad<br>
tags at position 196 and 1736. And in the extreme case 100 bad tags scattered randomly.Here are the figures for performance.
?=Didn't wait to find out, but longer than I can be bothered to wait.
Is this what you'd expect? All we do that's special is whitelist 60 tags, allow "face" attributes and disallow "src" attributes.
Breaking the debugger usually shows that all the work is being done in AngleSharp.Dom.Node.RemoveChild, but I haven't run a perfview to find out more.
If there are size, speed, memory limits to this module, can they be published?
At the moment, my own plan for mitigation is that once any embeded
img data
tags are stripped, if the size of the HTML is over 1mb, I won't bother sanitizing it, and may reject it.The text was updated successfully, but these errors were encountered: