Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LazyString's __html__ is an incomplete implementation of MarkupSafe #224

Open
jace opened this issue Mar 28, 2023 · 0 comments
Open

LazyString's __html__ is an incomplete implementation of MarkupSafe #224

jace opened this issue Mar 28, 2023 · 0 comments

Comments

@jace
Copy link

jace commented Mar 28, 2023

MarkupSafe's Markup class, as used in Flask, is very careful about mixing escaped and unescaped strings. For instance, Markup(...).format(...) will ensure all format variables are escaped before being interpolated into the string, and the resulting string is fully escaped.

LazyString does not do this, and the presence of a __html__ method (as previously raised in #121) creates a situation where there is no way to tell whether a string is properly escaped or not:

>>> from flask_babel import lazy_gettext
>>> from markupsafe import Markup
>>> l = lazy_gettext("This is a <em>string</em> with a {var}")
>>> l
l'This is a <em>string</em> with a {var}'

>>> Markup(l)
Markup('This is a <em>string</em> with a {var}')

>>> Markup(l.format(var="variable & more"))
Markup('This is a <em>string</em> with a variable & more')

>>> Markup(l).format(var="variable & more")
Markup('This is a <em>string</em> with a variable &amp; more')

When a lazy string has format variables, it must be wrapped in Markup() before calling .format() to make it continue to behave as a HTML string. However, this is dangerous to do in a function that receives the string as parameter. Markup wrapping must happen at source, but that is also not possible in a lazy context as it causes a string evaluation.

Here is a test case showing how gettext, lazy_gettext and Markup all behave differently. As a result, neither translator nor programmer has any indication on whether any given string is plain text or HTML, and every string will need a full integration test to confirm markup and escaping are handled appropriately across translations.

Possible mitigations:

  • LazyString needs two implementations with and without __html__, for use by the Jinja extension and lazy_gettext functions in HTML and non-HTML contexts respectively. This solves for the programmer but not the translator, who still gets no context on whether it is safe to use common punctuation characters like &, < and >. This may be solved by separating HTML and non-HTML strings into different domains.

  • LazyString must derive from Markup, or reproduce its functionality to ensure appropriate escaping in all scenarios, and gettext must wrap strings in Markup before returning. This solves for both parties in a consistent way, but commits to all i18n strings being HTML, requiring unescaping before use in non-HTML contexts (JSON APIs, Markdown, etc).

from flask import Flask, Markup
from flask_babel import Babel, gettext, lazy_gettext

import pytest


@pytest.fixture(scope='session')
def app():
    return Flask(__name__)


@pytest.fixture(scope='session')
def babel(app):
    return Babel(app)


@pytest.fixture()
def ctx(app, babel):
    with app.test_request_context() as context:
        yield context


raw_string = "This is a <em>string</em> with a {var}"

get_texts = [
    pytest.param(lambda: gettext(raw_string), id='str'),
    pytest.param(lambda: lazy_gettext(raw_string), id='lazy'),
    pytest.param(lambda: Markup(raw_string), id='markup'),
]


@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_type(get_text):
    text = get_text().format(var="variable & more")
    assert isinstance(text, str)


@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_value(get_text):
    text = get_text().format(var="variable & more")
    assert text == "This is a <em>string</em> with a variable &amp; more"


@pytest.mark.usefixtures('ctx')
@pytest.mark.parametrize('get_text', get_texts)
def test_gettext_html(get_text):
    text = get_text().format(var="variable & more")
    assert '__html__' in text

Output (with errors interpolated):

FAILED lazystr_test.py::test_gettext_value[str] - AssertionError: assert equals failed
E         'This is a <em>string</em> with a variable & more'      'This is a <em>string</em> with a variable &amp; more'
FAILED lazystr_test.py::test_gettext_value[lazy] - AssertionError: assert equals failed
E         'This is a <em>string</em> with a variable & more'      'This is a <em>string</em> with a variable &amp; more'
FAILED lazystr_test.py::test_gettext_html[str] - AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
E       AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[lazy] - AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
E       AssertionError: assert '__html__' in 'This is a <em>string</em> with a variable & more'
FAILED lazystr_test.py::test_gettext_html[markup] - AssertionError: assert '__html__' in Markup('This is a <em>string</em> with a variable &amp; more')
E       AssertionError: assert '__html__' in Markup('This is a <em>string</em> with a variable &amp; more')
=== 5 failed, 4 passed ===
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant