Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #33236 -- Fixed assertHTMLEqual() error message for escaped HTML. #15033

Merged
merged 1 commit into from
Oct 29, 2021

Conversation

pratyushmittal
Copy link
Sponsor Contributor

Ticket Link: https://code.djangoproject.com/ticket/33236

The diff shown in the error message of assertHTMLEqual seems to be converting escaped HTML text to unescaped text.

This makes it hard to write tests when testing XSS vulnerabilities in our tags and filters. Though the assertions work correct, the error messages don't show the correct differences.

Steps to reproduce

from django.test import TestCase

class UtilsTestCase(TestCase):
def test_assersion(self):
	escaped = "<p>&lt;foo&gt;</p>"
	raw = "<p><foo></p>"
	self.assertHTMLEqual(escaped, raw)

Expected Output

AssertionError: <p>
&lt;foo&gt;
</p> != <p>
<foo>
</p>
  <p>
- &lt;foo&gt;
+ <foo>
  </p>

Actual Output

AssertionError: <p>
<foo>
</p> != <p>
<foo>
</p>
  <p>
  <foo>
  </p>

Fix

The message in assertHTMLEqual shows incorrect output because it shows the escaped HTML entities as unescaped.

The bug is probably caused because the __str__ method in the Element class treats all its children the same. The children are either a tree or string. In the case of a string, the Python's HTMLParser unescapes the contents. For their string representation, we probably need to escape them back.

I have added a patch for this in this pull request.

@github-actions
Copy link

Hello @pratyushmittal! Thank you for your contribution 💪

As it's your first contribution be sure to check out the patch review checklist.

If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!

If you have any design or process questions then you can ask in the Django forum.

Welcome aboard ⛵️!

@pratyushmittal pratyushmittal marked this pull request as draft October 29, 2021 01:13
@pratyushmittal pratyushmittal marked this pull request as ready for review October 29, 2021 01:14
Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pratyushmittal Thanks for this patch 👍

tests/test_utils/tests.py Outdated Show resolved Hide resolved
tests/test_utils/tests.py Outdated Show resolved Hide resolved
@felixxm felixxm changed the title Fix for showing the correct diff in assertHTMLEqual [Ticket 33236] Fixed #33236 -- Fixed assertHTMLEqual() error message for escaped HTML. Oct 29, 2021
@pratyushmittal
Copy link
Sponsor Contributor Author

@pratyushmittal Thanks for this patch 👍

Thanks a lot @felixxm for the review and making the code simple :).

I have made these changes in d43b2a7.

@felixxm
Copy link
Member

felixxm commented Oct 29, 2021

@pratyushmittal Some cases are not covered, e.g.

>>> html1 = "<br>"
>>> html2 = "&lt;br&gt;"
>>> self.assertHTMLEqual(html1, html2)
....
AssertionError: <br> != <br>

@pratyushmittal
Copy link
Sponsor Contributor Author

Nice catch @felixxm. I am working on it.

@pratyushmittal
Copy link
Sponsor Contributor Author

@felixxm I have fixed the issue in d1b80a6.

The output was showing unescaped <br> because the RootElement class had its own __str__ function. It too used the same old method for displaying its children.

django/test/html.py Outdated Show resolved Hide resolved
django/test/html.py Outdated Show resolved Hide resolved
@felixxm
Copy link
Member

felixxm commented Oct 29, 2021

@pratyushmittal Thanks 👍 Welcome aboard ⛵

I pushed minor edits and add a list to join() calls. A list comprehension is preferable here as str.join() converts to list internally anyway. It is better performance to provide a list up front.

@pratyushmittal
Copy link
Sponsor Contributor Author

Ah! Awesome. Thanks a lot @felixxm for the reviews.

@kezabelle
Copy link
Contributor

kezabelle commented Oct 29, 2021

Throwing this out there, as the new tests don't look to cover it, does the problem exist for things like &#62; or &#x3E;, and does the change account for that?

@felixxm
Copy link
Member

felixxm commented Oct 29, 2021

Throwing this out there, as the new tests don't look to cover it, does the problem exist for things like &#62;, and does the change account for that?

The decimal/hexadecimal character references are always converted to a named character reference, so this patch fixes error messages for them as well. I added an extra assertion.

@felixxm felixxm merged commit f38458f into django:main Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants