Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow wildcards in whitelist attributes #499

Open
foo4u opened this issue Dec 2, 2014 · 11 comments · May be fixed by #1871
Open

Allow wildcards in whitelist attributes #499

foo4u opened this issue Dec 2, 2014 · 11 comments · May be fixed by #1871

Comments

@foo4u
Copy link

foo4u commented Dec 2, 2014

HTML5 allows the use of data-foo, data-foo-bar, etc to specify information on elements. These are relatively harmless and should only contain text.

Currently, each data- attribute needs to be specified explicitly on a whitelist so that it's not removed by Jsoup.clean(). Can we add support for either:

  1. Wildcard attributes, e.g. Whitelist.relaxed().addAttributes("a", "data-*")
    or
  2. A new function, like Whitelist.relaxed().allowDataAttributes("a")
@foo4u
Copy link
Author

foo4u commented Dec 2, 2014

Also wanted to add that I'm willing to contribute code to support this. Before doing so, I just want to make sure this change is acceptable and determine the best way to support it (options 1 or 2 above, or something entirely different). Thanks!

@jhy
Copy link
Owner

jhy commented Apr 3, 2015

Hi @foo4u. Sorry for the late reply. I like option 2 (because I can't think of another case which it would be helpful for). Would be great if you write that.

@remisbaima
Copy link

I guess it would be more flexible if you implemented option 1 with regex patterns instead of only wildcards. E.g.: Whitelist.relaxed().addAttributes("a", "data-.*")
Think about e.g. https://angularjs.org code that has attributes starting with "ng-". Also almost every second ;-) a new JS framework appears and these might require new attributes prefixes. With option 1 with regex support you would be more future-proof.

@jhy
Copy link
Owner

jhy commented Apr 3, 2015

OK, handling examples like that makes sense. I'd be OK with either a prefix or a regex matcher. The prefix match seems simple and unlikely to let anyone shoot themselves in the foot.

@foo4u
Copy link
Author

foo4u commented Apr 20, 2015

Ok, will try to get a PR for prefix matching sent in a few weeks.

@jhy
Copy link
Owner

jhy commented Nov 15, 2017

(Closing out old, dormant bugs. If you are still impacted by this, please reopen & vote.)

@jhy jhy closed this as completed Nov 15, 2017
@swapab
Copy link

swapab commented Feb 11, 2020

I am trying out jsoup to validate html pages. Works great so far.
Would have been awesome, if wildcards were possible with jsoup.

@promiselaoliu
Copy link

promiselaoliu commented Dec 23, 2022

Ok, will try to get a PR for prefix matching sent in a few weeks.

Hi @foo4u , have you ever prepared a PR?

@promiselaoliu
Copy link

I have prepared a PR: #1871

@jhy
Copy link
Owner

jhy commented Jan 24, 2023

(Reopening as mentioned in earlier close, there is renewed interest here.)

@jhy jhy reopened this Jan 24, 2023
@jhy jhy linked a pull request Jan 24, 2023 that will close this issue
@irandamay
Copy link

Is there any update on this... it looks like the changes requested in the PR were made back in February?

I have been watching these for some months now because we have a need to not strip out aria-* attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants