Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

  is not removed from the InnerText #473

Open
FarshanAhamed opened this issue Apr 21, 2022 · 3 comments
Open

  is not removed from the InnerText #473

FarshanAhamed opened this issue Apr 21, 2022 · 3 comments

Comments

@FarshanAhamed
Copy link

1. Description

Here I'm trying to strip Html tags and attributes from a text and most of the tags are removed but   is staying in the text.

3. Fiddle or Project

https://dotnetfiddle.net/haBumr

public static string StripHtmlTags(this string input)
{
    var doc = new HtmlDocument();
    doc.LoadHtml(input ?? "");
    return doc.DocumentNode.InnerText;
}

Input text:
<p>This is a test string.&nbsp;</p>
Output text:
This is a test string.&nbsp;

Is there any way I can get the text as I see in a browser?

  • HAP version: 1.11.42
  • NET version (.net core 2.2, .net core 3.1, etc.)
@elgonzo
Copy link
Contributor

elgonzo commented Apr 21, 2022

See the "Decode and strip HTML" example over here: https://html-agility-pack.net/online-examples

However, contrary to that the example code, i would strongly suggest to do the entity decoding after getting the inner text, and not before loading the HTML data into HtmlAgilityPack.

@FarshanAhamed
Copy link
Author

Great. I figured using the decode HTML earlier. But, I thought there might be a way where InnerText will decode HTML if I provide some flag while loading HTML. Thank you for your help

@snowchenlei
Copy link

LoadFromWebAsync how to decode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants