New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force JSoup to ignore custom HTML tags #2101
Comments
The reason for this behavior is that your HTML is invalid. Any tag must have an end tag, aside from a few exceptions listed here: Also, custom elements must contain a hyphen (e.g I am not aware of any setting that ignores custom tags but there are two other options:
Output:
Output:
|
Thank you for the response, and I apologize for my late one. I was aware that it was identifying it as a tag and trying to treat it as such. I was able to work around it in my program thankfully and circumvent the entire thing. Not so fortunately, it ended massively overcomplicating my code. The problem that this API has, in my opinion, is that there is no way to turn off the autocorrecting of the parse function. It's not that I'm requesting that the API ignore them entirely, but in my opinion, there should be a way to have JSoup parse the strings, and simply not call whatever function is inserting text into my Strings without my permission. It's worsened by the fact I have no control whatsoever, even having a callback when it edits the string would be nice, mostly so I could just override it and have it not touch the string. If this project is ever updated, I suggest the feature to work something like this: If this is set to false, then the code that inserting the end tag automatically will simply not be called, and the text will not be parsed by the system. Ideally, it'd also include an optional callback that catches the "error" and feeds it into the function. If I could please be directed to the code in this API that handles this autocorrecting functionality, perhaps I could look into adding the support to help out, or at least have the change locally. Thank you again for your time! |
I'm working with some code that is parsing HTML. This API has worked great so far for being able to dig the data I need out and easily read it, but I've ran into an issue where JSoup is inserting html into the text where it's not wanted or needed. This is supposedly a feature, but unfortunately it's completely ruining my entire implementation.
Here is the text:
<u>Oh, hello! You must be the person I've been waiting on all morning. </u><strong><u>You wouldn't happen to be <player> would you?</u></strong>
It's pretty normal outside of the custom tag that's been added. The program is meant to parse that itself and change it into a name. I had presumed that if JSoup did not recognize a tag, it would leave it alone. But instead, it's mangling the text into this:
It seemingly even adds a line break for some reason, and then also randomly adds another player tag onto the end, which confuses the system even more.
Is there a way to toggle this functionality off entirely, and have JSoup stick to tags it specifically recognizes? I'd like to solve this, since if I can't keep our custom tags formatted like html tags, then I'll have to write a whole other system to parse a different format with something like
[[]]
, which would be a bit redundant.Also, just to clarify, all I want is for JSoup to ignore my custom tag entirely. I essentially want it to stay in its own lane and only parse pure HTML, and then ignore anything that isn't.
Thank you for your time.
The text was updated successfully, but these errors were encountered: