Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paragraph sanitization (e.g. img.alt) is too restrictive, disallows punctuation #158

Open
palant opened this issue Nov 23, 2022 · 0 comments

Comments

@palant
Copy link

palant commented Nov 23, 2022

This regexp is used to validate alt text of images. It disallows common punctuation, which causes issues when alt text is copied from news articles or source code listings for example. The result is alt attribute being dropped, rendering the image inaccessible to vision impaired people. And the text author is unlikely to even notice the issue, as visually the result seems just fine.

Subset of common symbols (some used in non-English languages) currently forbidden by this regular expression: "„“”‘’«»#$§%‰&*+±–—:;=?‽¡¿@{}|~…°®™.

I’m not sure I understand the purpose of restricting to a specific character set here, as opposed to properly escaping special characters (which I believe bluemonday does automatically). Is the concern that the contents of the alt or title attribute might be taken as the HTML source of some pop-up? Wouldn’t it make more sense to blacklist only angle brackets then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant