Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespace declarations (and namespaced attributes) are not serialized correctly #470

Open
BhaaLseN opened this issue Mar 10, 2022 · 0 comments
Assignees

Comments

@BhaaLseN
Copy link

1. Description

I'm using HAP to load HTML files and transform them into something XML-valid (because I have existing APIs that take XDocument and do things with it).
Some of those contain XML Namespace declarations (often due to a lazy XSLT that doesn't omit them from its output; sometimes because it carries extra information in foreign namespaces on elements).

I've used 1.4.6 as DLL for the longest time, and only recently figured that I could/should switch to NuGet (which at that time already had 1.11.42). After a quick series of tests, I noticed that some files would fail; and I bisected it back to 1.6.3 being the first problematic version, while 1.6.2 still works as I expect it.

Note: I care less about the actual format; and more about it being valid XML in the end. The APIs where I hand the XDocument to don't necessarily care about the namespaces or their attributes, since they look at other aspects (such as certain HTML or XML tags, or simply the text content of elements that don't match a blacklist, etc.)
Not being able to obtain a valid XML here makes it a 100% to 0% drop, while a mangled element- or attribute-name (due to being in a namespace) is barely able to get 100% down to 98% (which is still considered "good enough" for what I'm doing there).

HAP fit the bill, and it did so with very little code - so I dropped it right in.
The code sample might not be the most optimal code possible; but it is what I ended up with (because it worked). So, in case I simply have to toggle a few switches, I'd also be ok with that.

2. Exception

System.ArgumentException: Invalid name character in 'xmlns:test'. The ':' character, hexadecimal value 0x3A, cannot be included in a name.
   at System.Xml.XmlWellFormedWriter.CheckNCName(String ncname)
   at System.Xml.XmlWellFormedWriter.WriteStartAttribute(String prefix, String localName, String namespaceName)
   at HtmlAgilityPack.HtmlNode.WriteAttributes(XmlWriter writer, HtmlNode node)
   at HtmlAgilityPack.HtmlNode.WriteTo(XmlWriter writer)
   at HtmlAgilityPack.HtmlNode.WriteTo(XmlWriter writer)
   at HtmlAgilityPack.HtmlDocument.Save(XmlWriter writer)
   at Program.Main()

3. Fiddle or Project

https://dotnetfiddle.net/Nd8vqF

using System.Xml;
using System.Xml.Linq;
using HtmlAgilityPack;

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(@"<html xmlns:test=""urn:test:ns"">
    <body>
        <p test:this=""namespace"">don't mind this<br>line break</p>
    </body>
</html>");

htmlDoc.OptionOutputAsXml = true;

var ms = new MemoryStream();
using (var writer = XmlWriter.Create(ms, new XmlWriterSettings() { OmitXmlDeclaration = true, ConformanceLevel = ConformanceLevel.Fragment }))
{
    htmlDoc.Save(writer); // this throws
    ms.Position = 0;
    var doc = XDocument.Load(ms);
    // feed doc into something that expects an XDocument as input:
    Console.WriteLine(doc.ToString());
}

4. Any further technical details

Add any relevant detail can help us, such as:

  • HAP version: Any version since 1.6.3 fails; it worked up until 1.6.2
  • NET version: net48, but also net6.0

I think the change in #95 might be related to this.
And I also found the related #168, where I strongly believe the code sample is wrong (xmlns:MyNamespace="value" should be xmlns:value="MyNamespace", otherwise the result is an unused namespace MyNamespace along with the undeclared prefix value) and might have introduced further issues down the road.

@JonathanMagnan JonathanMagnan self-assigned this Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants