Fix parsing "meta" tag with encoding attribute #432

willkg · 2019-01-08T15:35:30Z

When parsing a tag, the parser calls charEncoding
and changeEncoding in the input stream, but the InputStreamWithMemory
wrapper didn't have those methods. This fixes that.

This also creates a new test set for BleachHTMLParser functionality.

Fixes #431

When parsing a <meta encoding=""> tag, the parser calls charEncoding and changeEncoding in the input stream, but the InputStreamWithMemory wrapper didn't have those methods. This fixes that. This also creates a new test set for BleachHTMLParser functionality. Fixes #431

willkg · 2019-01-08T15:36:05Z

CHANGES

@@ -1,7 +1,7 @@
 Bleach changes
 ==============

-Version 3.0.3 (In development)
+Version 3.1.0 (In development)


We're adding backwards-compatible API changes, so this needs to be a MINOR increment rather than a PATCH increment.

willkg · 2019-01-08T15:36:16Z

CHANGES

@@ -25,6 +25,12 @@ None
 * Fix cases where attribute names could have invalid characters in them.
  (#419)

+* Fix problems with ``LinkifyFilter`` not being able to match links
+  across ``&amp;``. (#422)


I should have added this in the last PR.

willkg · 2019-01-08T15:36:56Z

tests/test_html5lib_shim.py

@@ -80,3 +80,65 @@ def test_serializer(data, expected):
    serialized = serializer.render(walker(dom))

    assert serialized == expected
+
+
+@pytest.mark.parametrize('parser_args, data, expected', [


This tests the specific issue here plus other interesting BleachHTMLParser behavior.

willkg · 2019-01-08T15:37:44Z

tests/test_html5lib_shim.py

+        {},
+        '<a href=\'http://example.com\'\'>',
+        '<a href="http://example.com"></a>'
+    )


We had a LinkifyFilter test that tested this. I nixed that test and moved the relevant parts here where they're not encumbered by LinkifyFilter things.

willkg · 2019-01-08T15:38:06Z

tests/test_linkify.py

-        '<a href="http://example.com/" rel="nofollow"></a>'
-    )
-
-


The important part of this got moved to the BleachHTMLParser tests.

g-k

r+ lgtm

willkg · 2019-01-08T17:00:22Z

Thank you!

willkg commented Jan 8, 2019

View reviewed changes

willkg requested a review from g-k January 8, 2019 15:38

willkg mentioned this pull request Jan 8, 2019

Shim for wrapped stream, InputStreamWithMemory, charEncoding: #428

Closed

g-k approved these changes Jan 8, 2019

View reviewed changes

willkg merged commit cabd665 into mozilla:master Jan 8, 2019

willkg deleted the 431-charencoding branch January 8, 2019 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing "meta" tag with encoding attribute #432

Fix parsing "meta" tag with encoding attribute #432

willkg commented Jan 8, 2019

willkg Jan 8, 2019

willkg Jan 8, 2019

willkg Jan 8, 2019

willkg Jan 8, 2019

willkg Jan 8, 2019

g-k left a comment

willkg commented Jan 8, 2019

Fix parsing "meta" tag with encoding attribute #432

Fix parsing "meta" tag with encoding attribute #432

Conversation

willkg commented Jan 8, 2019

willkg Jan 8, 2019

Choose a reason for hiding this comment

willkg Jan 8, 2019

Choose a reason for hiding this comment

willkg Jan 8, 2019

Choose a reason for hiding this comment

willkg Jan 8, 2019

Choose a reason for hiding this comment

willkg Jan 8, 2019

Choose a reason for hiding this comment

g-k left a comment

Choose a reason for hiding this comment

willkg commented Jan 8, 2019