Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed new settings to leave HTML entities untouched and to clean the contents of attribute values #1832

Closed

Conversation

kworam-rally
Copy link

@kworam-rally kworam-rally commented Aug 31, 2022

Some proposed modifications to jsoup:

  1. Introduce Entities.EscapeMode.none which causes Entities.escape() to leave HTML entities untouched.
  2. Introduce Cleaner.CleanerSettings with cleanAttributeValues() and baseUri(). If cleanAttributeValues() is true, the Cleaner cleans HTML embedded in attribute values using baseUri().

@kworam-rally kworam-rally changed the title S222317 customize jsoup Proposed new settings to leave HTML entities untouched and to clean the contents of attribute values Aug 31, 2022
@jhy
Copy link
Owner

jhy commented Apr 24, 2023

I will never implement an None entities escape mode in jsoup. It is a loaded foot-gun, and indicative that the library is being used incorrectly. If you want plain text output, use one of the .text() methods. If one of those doesn't fit the use case, I'd be happy to hear more about the use case and we can explore ways to improve those.

@jhy jhy closed this Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants