Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

  and other escape sequences are saved incorrectly when using XHTML mode #466

Closed
Mertsch opened this issue Feb 8, 2022 · 6 comments
Assignees

Comments

@Mertsch
Copy link

Mertsch commented Feb 8, 2022

1. Description

[TestMethod]
public void OptionOutputAsXmlBugTest()
{
    string html = @"Start| |<|>|&|€|£|"|'|End";
    HtmlDocument htmlDocument = new HtmlDocument
        {
            OptionOutputAsXml = true,
        };
    htmlDocument.LoadHtml(html);
    StringWriter stringWriter = new StringWriter(new StringBuilder(html.Length + 1000), CultureInfo.InvariantCulture);
    htmlDocument.Save(stringWriter);
    Assert.AreEqual("<?xml version=\"1.0\" encoding=\"utf-8\"?>Start|&nbsp;|&lt;|&gt;|&amp;|&euro;|&pound;|&quot;|&apos;|End", stringWriter.ToString());
    //           Actual: <?xml version="1.0" encoding="utf-8"?>Start|&amp;nbsp;|&lt;|&gt;|&amp;|&amp;euro;|&amp;pound;|&quot;|&amp;apos;|End
}

As you can see &nbsp; is saved as &amp;nbsp;. Same goes for other HTML escape sequences, but not all 🤪

  • HAP version: 1.11.42
  • NET version: .NET 6.0.1
@JonathanMagnan JonathanMagnan self-assigned this Feb 8, 2022
@JonathanMagnan
Copy link
Member

Hello @Mertsch ,

This is expected since this is how a &nbsp is escaped in XML: https://www.freeformatter.com/xml-escape.html

There is indeed some change possible that we could do as discussed here: #456 but if we talk purely XML, that is the right behavior.

Best Regards,

Jon


Sponsorship
Help us improve this library

Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework ExtensionsBulk OperationsDapper Plus

Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval FunctionSQL Eval Function

@Mertsch
Copy link
Author

Mertsch commented Feb 9, 2022

Hello @JonathanMagnan Thank you very much for your explanation and time.

I do understand now, that HTML & characters need to be escaped for XML. But as your link suggests shouldn't the the output be
Start|&amp;nbsp;|&amp;lt;|&amp;gt;|&amp;amp;|&amp;euro;|&amp;pound;|&amp;quot;|&amp;apos;|End
by &amp;ing every & in the text?!

The linked issue #456 I do not fully understand. It seems there is the "backwards compatible" flag which specifically keeps &nbsp, but if it's about XML escaping ... why only some &s?

@JonathanMagnan
Copy link
Member

Hello @Mertsch ,

My bad, I just saw the part about the &nbsp of your initial post.

That OptionOutputAsXml is currently very confusing. I will look at it more deeply.

Best Regards,

Jon

@ghost
Copy link

ghost commented Dec 1, 2023

I do not have this trouble, if I use HtmlDocument.BackwardCompatibility = false.

@Mertsch
Copy link
Author

Mertsch commented Dec 1, 2023

I have chosen to go with https://github.com/AngleSharp/AngleSharp and this issue is no longer relevant to me. If you want to close it, feel free to do so.

@JonathanMagnan
Copy link
Member

Hello @Mertsch ,

We will close this issue in this case. AngleSharp is a great library, so surely I understand your choice.

Best Regards,

Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants