Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit XML definition of white space #39

Open
Arithmeticus opened this issue Mar 20, 2023 · 10 comments
Open

Permit XML definition of white space #39

Arithmeticus opened this issue Mar 20, 2023 · 10 comments

Comments

@Arithmeticus
Copy link

I am using XML Unit 2.9.2 within the context of a .NET 4.72 application.

The method NormalizeWhitespace() is keyed to the Unicode property Zs, which targets more than a dozen characters. But the specifications for XML define white space as only four characters: x9, xA, xD, x20, and that's the basis for standard technologies in the XML stack (e.g., XPath).

I can see the utility for the current XML Unit approach to whitespace, but I’m in a position where I need to use the more restrictive official XML definition. Could XML Unit 2 and 3 be enhanced to allow users to choose between definitions of whitespace?

@bodewig
Copy link
Member

bodewig commented Mar 22, 2023

If you are using .NET Framework, then over there might have been the better place. :-) - The issue itself most probably applies to the Java version as well, so all is fine, I just wanted to make sure you know there is a dedicated tracker for the .NET version.

I assume the Trim in NormalizeWhitespace is what is causing problems for your use-case. Of course it would be possible to add NormalizeXMLWhitespace and StripXmlWhiteSpace methods to Nodes that would "trim" more selectively. And we could add corresponding (I)Source implementations and even add new flags to InputBuilder. I wouldn't be opposed to that.

In practice you can write such a NormalizeXMLWhitespace in your own code and apply that to your DOM based ISource yourself outside of XMLUnit if you want to test the concept first.

@bodewig
Copy link
Member

bodewig commented Mar 29, 2023

@Arithmeticus I believe 6e334a4 provides what you are asking for. It would be good if you could give it a try.

@bodewig
Copy link
Member

bodewig commented Mar 29, 2023

Actually, Java's trim works quite different from the C# one, it only removes control characters and space, so this issue only applies to XMLUnit.NET. That's why I'm going to transfer it.

@bodewig bodewig transferred this issue from xmlunit/xmlunit Mar 29, 2023
@bodewig bodewig added this to the 2.10.0 milestone Mar 29, 2023
@bodewig
Copy link
Member

bodewig commented Mar 29, 2023

this is now #39

@bodewig bodewig closed this as completed Mar 29, 2023
@bodewig
Copy link
Member

bodewig commented Mar 29, 2023

didn't mean to close this one, I thought it was the one over in the Java project, sorry.

@bodewig bodewig reopened this Mar 29, 2023
@Arithmeticus
Copy link
Author

Thanks. I had posted on the java section, because it looked like that's where the party was. I'll give your suggestions a try.

@Arithmeticus
Copy link
Author

Preliminary note, before more substantive comments. I managed to get the new XmlWhitespaceStrippedSource() object to work. It was challenging, because of gaps in the documentation. Note dead link: http://www.xmlunit.org/api/net/master/Org.XmlUnit.Builder/Input.html (host page https://github.com/xmlunit/user-guide/wiki/Providing-Input-to-XMLUnit#inputbuilder).

I see https://www.xmlunit.org/api/net/2.9.2/Org.XmlUnit.Input/index.html but the documentation there seems incomplete.

In the end I got it to work, but it took more time than I would have liked, and for anyone else trying to get their code to work, here's my approach:

string pathA = "D:\\bugs\\28724\\space_test_a.xml";
string pathB = "D:\\bugs\\28724\\space_test_b.xml";
ISource Benchmark = Input.FromFile(pathB).Build();
XmlWhitespaceStrippedSource BenchmarkStripped = new XmlWhitespaceStrippedSource(Benchmark);
ISource Test = Input.FromFile(pathA).Build();
XmlWhitespaceStrippedSource TestStripped = new XmlWhitespaceStrippedSource(Test);
Diff xmlDiffBuilder;
xmlDiffBuilder = DiffBuilder.Compare(TestStripped).WithTest(BenchmarkStripped).Build();

The new object works as expected, even though that wasn't what I was hoping to get. See next post.

@Arithmeticus
Copy link
Author

Before submitting the ticket I had avoided the modified flavors of -Source objects, primarily because documentation was confusing, and I wasn't certain what I was going to get loaded up. Therefore I was looking at post-build methods such as NormalizeWhitespace(), which doesn't permit the strict XML definition for whitespace. I think a XmlNormalizeWhitespace() counterpart method, analogous to the adjustment you've made here, would be helpful.

That said, in figuring out the new class, I learned to appreciate the input -Source variants. If you're happy with the approach taken with XmlWhitespaceStrippedSource, then WhitespaceNormalizedSource would also need a XmlWhitespaceNormalizedSource counterpart. (That's the option I really wanted.)

For the purposes of documentation, it would be good to signal to users that the activity performed by -StrippedSource is a proper subset of the activity performed under -NormalizedSource. To some, "stripped" may sound harsher than "normalized" and imply a more drastic operation.

@bodewig
Copy link
Member

bodewig commented Apr 10, 2023

Many thanks for the feedback @Arithmeticus . I consider the user's guide the primary documentation, so maybe https://github.com/xmlunit/user-guide/wiki/Providing-Input-to-XMLUnit would have been given better guidance - or not.

If you feel the documentation is not adequate please help us improving it. Sometimes it is difficult to know what is lacking from the docs when you are too familiar with the code.

I'll add XmlWhitespaceNormalizedSource and additional methods to the builder if this helps.

@bodewig
Copy link
Member

bodewig commented Apr 10, 2023

OK, the builder has been adjusted and I've tried to clarify the docs a bit more. Also I've added more extensive content to the user guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants