Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

No CDATA block in content block of atom feed #82

Open
av3 opened this issue Jun 22, 2018 · 8 comments
Open

No CDATA block in content block of atom feed #82

av3 opened this issue Jun 22, 2018 · 8 comments

Comments

@av3
Copy link

av3 commented Jun 22, 2018

Hello,

I wanted to provide feeds (via the Writer of Zend Feed) with the full content of an article (including some HTML5 markup) and thought to prefer atom over rss. But the writer is acting different and causes some trouble for me.

My code:

    $entry = $feed->createEntry();
    $entry->setContent($news->getText);

Output for RSS:

    <item>
      <content:encoded><![CDATA[<p>My content ...</p>]]></content:encoded>

Output for Atom:

  <entry xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
      <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml">
        <xhtml:p>My content ...</xhtml:p>
      </xhtml:div>
    </content>
  </entry>

And if I add any image to it in HTML5-Style <img src="myimage.jpg"> instead of XHTML-Style <img src="myimage.jpg" />, I get a warning:

DOMDocument::loadXML(): Opening and ending tag mismatch: img line 1 and p in Entity, line: 1

In the atom example in the documentation there is the output:

        <content type="html">
            <![CDATA[I am not writing the article.
                     The example is long enough as is ;).]]>
        </content>

In _setDescription I found $dom->createCDATASection (Entry\Rss and Entry\Atom). But in Atom it's just the summary and in Rss the Content.

In Entry\Atom the _setContent is relevant for the content block, which I wanted to use to output the full content and not just a summary. And there I found $element->setAttribute('type', 'xhtml') in _setContent.

I doubt that the atom output of the example in the documentation is even possible with Zend Feed or am I wrong? It would be great, if the atom feed would also use the CDATA blockinstead of the xhtml for the content.

@froschdesign
Copy link
Member

froschdesign commented Jul 6, 2018

@av3
If you install the PHP extension "Tidy", then zend-feed will be converted your HTML to XHTML.

Example:

$tidy = new \tidy;
$tidy->parseString(
    '<p><img src="foo.jpg"></p>',
    [
        'output-xhtml'   => true,
        'show-body-only' => true,
        'quote-nbsp'     => false,
    ]
);
$tidy->cleanRepair();

var_dump((string) $tidy); // <p><img src="foo.jpg" /></p>

if (class_exists('tidy', false)) {
$tidy = new \tidy;
$config = [
'output-xhtml' => true,
'show-body-only' => true,
'quote-nbsp' => false
];
$encoding = str_replace('-', '', $this->getEncoding());
$tidy->parseString($content, $config, $encoding);
$tidy->cleanRepair();
$xhtml = (string) $tidy;
} else {
$xhtml = $content;
}

@av3
Copy link
Author

av3 commented Jul 6, 2018

Thanks for your reply, @froschdesign. With Tidy it's working, even if it's not very beautiful:

<content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
  <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml"><xhtml:img src="myimage" />
    <xhtml:p>My content</xhtml:p>
  </xhtml:div>
</content>

But is this really necessary? Wouldn't it be better to use $dom->createCDATASection? Then we wouldn't need tidy to create the content section. Or is there a specific reason why _setDescription (of Rss and Atom) creates a CDATA section and _setContent does not?

But this would also mean that the atom output example of the documentation is wrong, right?

If I don't want that xhtml output: Would it be possible to write an own Writer Extension where I could overwrite the _setContent method? Are there any examples how to register own Writers? In the documentation there is just a "TODO" for that chapter.

@froschdesign
Copy link
Member

froschdesign commented Jul 6, 2018

With Tidy it's working, even if it's not very beautiful:

Why isn't it beautiful? The generated code works and is correct.

Then we wouldn't need tidy to create the content section.

The content of atom:content should be suitable for handling as HTML or XHTML - depending on the specified type. Tidy helps us here to meet the specifications.

But this would also mean that the atom output example of the documentation is wrong, right?

Right!

In the documentation there is just a "TODO" for that chapter.

Oh, this is a mistake. Thanks for the hint!

@froschdesign
Copy link
Member

froschdesign commented Jul 6, 2018

Please have a look at content:encoded: http://www.rssboard.org/rss-profile#namespace-elements-content-encoded

zend-feed also provides an extension for this element: Zend\Feed\Writer\Extension\Content\Renderer\Entry

The usage of the writer extensions are the same like described for the reader: https://docs.zendframework.com/zend-feed/reader/#extending-feed-and-entry-apis

@av3
Copy link
Author

av3 commented Jul 10, 2018

Why isn't it beautiful? The generated code works and is correct.

Yes, (meanwhile) I know that it's correct with the XHTML. It looks unusual for me and I thought it could be better to provide the content without modification (faster and smaller size), but this isn't important for a feed.

Please have a look at content:encoded: http://www.rssboard.org/rss-profile#namespace-elements-content-encoded

There it says:

The content MUST be suitable for presentation as HTML and be encoded as character data in the same manner as the description element.

and in description it says:

HTML markup MUST be encoded as character data either by employing the HTML entities < ("<") and > (">") or a CDATA section.

No word about xhtml content, but I know that it's also a valid solution for Atom feeds. But when it says "same manner as the description element" and the _setDescription method of the Renderer\Entry\Atom is correct with its createCDATASection inside, it should also be suitable for _setContent. I'm just wondering, because for me it's not consistent.

zend-feed also provides an extension for this element

But this works only for RSS feeds, not for Atom.

The usage of the writer extensions are the same like described for the reader: https://docs.zendframework.com/zend-feed/reader/#extending-feed-and-entry-apis

Thank you, but unfortunately I wasn't successful with this. Writing my own Renderer\Entry with a _setContent called in the constructor would cause a second content:encoded block in my Atom feed if Tidy is enabled.

I addition to this I tried to write my own extension to optimize my feed for feedly. I tried to start with my own Writer\Feed class and add methods for an accentColor and registering the namespace. But there is no Zend\Feed\Writer\Extension\AbstractFeed. extending my class with Zend\Feed\Writer\AbstractFeed will cause an error:

…/vendor/zendframework/zend-feed/src/Writer/StandaloneExtensionManager.php40:

Maximum function nesting level of '256' reached, aborting!

Next attempt: Without an extend AbstractFeed caused another error:

…/vendor/zendframework/zend-feed/src/Writer/AbstractFeed.php845:

call_user_func_array() expects parameter 1 to be a valid callback, class 'Webfeed\Writer\Feed' does not have a method 'getItunesAuthors'

I don't know why it's checking for a method of the iTunes extension in my own extension. But okay, this is another problem. Maybe you (or someone else) could provide an "JungleBooks" extension example for the Writer in the documentation.

@froschdesign
Copy link
Member

Sorry, the topic was Atom and not RSS. My mistake. 🤦‍♂️

No word about xhtml content, but I know that it's also a valid solution for Atom feeds.

See at the specification: https://tools.ietf.org/html/rfc4287#page-14

Maybe you (or someone else) could provide an "JungleBooks" extension example for the Writer in the documentation.

Maybe tomorrow. I will definitely give feedback.

@froschdesign
Copy link
Member

froschdesign commented Aug 7, 2018

@av3
An example for registering a writer extension can be found at #86

@weierophinney
Copy link
Member

This repository has been closed and moved to laminas/laminas-feed; a new issue has been opened at laminas/laminas-feed#7.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants