Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of XML null/nil values based on namespace prefix existence #1406

Open
machinateur opened this issue Apr 7, 2022 · 0 comments
Open

Comments

@machinateur
Copy link

Q A
Bug report? no
Feature request? yes
BC Break report? no
RFC? yes/no

I've come across a problem regarding the handling of null/nil values. This is related to XML de-/serialization functionality.

An xml node is with the current implementation generally considered to be null, when not present or when the xsi:nil="true" attribute is present.

Let me first describe the situation I encountered, the problem(s) this raises and after that probable solutions.

Steps required to reproduce the problem

Take an xml document similar to the following one:

<?xml version="1.0" encoding="utf-8"?>
<calendar>
  <entry>
    <date>2022-04-07</date>
    <time>22:00:00</time>
  </entry>
  <entry>
    <date>2022-04-01</date>
  </entry>
  <entry>
    <date>2022-04-07</date>
    <!-- empty element, no xsi:nil="true" attribute -->
    <time/>
  </entry>
</calendar>

I'll omit the serializer configuration and php objects, as this is only an example. The important thing is that both, the date and time properties of a calendar entry are to be deserialized as DateTimeImmutable or DateTime. The date/time format is pretty obvious here (Y-m-d and H:i:s respectively). The time is nullable, where the date property is not.

The result and what it means

Before investigating, I would've expected this xml to produce a valid object structure where the time of the last calendar entry is simply not set. Instead, this or anything like it, will present you with an exception Invalid datetime "", expected the format "H:i:s", [...]. and so on.

I've found out, that this was missing the xsi:nil="true" attribute to be considered null.

Again, it's generally assumed, that a node is only null, when not present or said attribute is set. That's due to that in xml, there is pretty much no real null, as we have it in php.

Well fine, some would say, just add that attribute in there. Would be a easy solution, indeed. Sadly, I've no direct control over the xml structure and serialization, e.g. I can't set the attribute. Ok, fine, I'm not picky, so a custom handler will do! And indeed it would (and did) do the job as needed eventually.

But this got me thinking: So why would the serializer try to look for the xsi:nil attribute, when there is no xsi prefix bound to any namespace. An xml is invalid, if xsi:* is used without being registered. Verifiable via the w3schools xml validator using the following xml:

<?xml version="1.0" encoding="utf-8"?>
<calendar>
  <entry>
    <date>2022-04-07</date>
    <time>22:00:00</time>
  </entry>
  <entry>
    <date>2022-04-01</date>
  </entry>
  <entry>
    <date>2022-04-07</date>
    <time xsi:nil="true"/>
  </entry>
</calendar>
Namespace prefix xsi for nil on time is not defined

Again, this is hypothetical.

It's obviously required to add that namespace prefix and bind it somewhere in the document. But I would argue, it's not an uncommon use-case to have no namespaces defined at all. So the (kind of) requirement to add that namespace and use the attribute to be able to have an empty node be considered null is rather unintuitive. Or at least don't use the xsi:nil check if there is no xsi prefix anyway.

What's your opinion on this?

I've seen the way it is done with the serialization. There, the namespace is only added if any null-element has been visited.

  • if ($this->nullWasVisited) {
    $this->document->documentElement->setAttributeNS(
    'http://www.w3.org/2000/xmlns/',
    'xmlns:xsi',
    'http://www.w3.org/2001/XMLSchema-instance'
    );
    }

I could think of something similar regarding the null handling to decide, depending on the presence of the namespace prefix in the document, if the xsi:nil is checked or some other logic should be used. There are several ways to allow the developer to influence the behavior.

To be completely clear: I'm not proposing to changing the behavior and replacing it with something that would violate the spec. I'm simply saying there are use-cases where the current handling could cause issues (as it did for me). And in my opinion this is a functionality, that could benefit such use-cases.

This is especially the case for any of the datetime types, as seen in the example above. An exception will certainly get thrown on empty elements of that type, that's in the nature of the DateHandler.

Certainly, the xml-owner could fix their serializer or whatever they use to produce the documents, but in a real-world scenario, I find this unlikely, especially, when there are other implications such a change could have. For example on other interfacing software, that are already accustomed to the shortcomings of such system.

What would be affected

  • $xsiAttributes = $value->attributes('http://www.w3.org/2001/XMLSchema-instance');
    if (
    isset($xsiAttributes['nil'])
    && ('true' === (string) $xsiAttributes['nil'] || '1' === (string) $xsiAttributes['nil'])
    ) {
    return true;
    }
  • private function isDataXmlNull($data): bool
    {
    $attributes = $data->attributes('xsi', true);
    return isset($attributes['nil'][0]) && 'true' === (string) $attributes['nil'][0];
    }

Some reading

I've gone through these references prior to deciding to open a new issue here.

I'm willing to PR this, if the feature is considered beneficial. Although, some implementation details would've to be discussed prior to that.

Best Regards
Marcel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant