Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve intrasite links in summaries. #3280

Merged
merged 2 commits into from
Jan 29, 2024

Conversation

@avaris
Copy link
Member

avaris commented Jan 22, 2024

Also allows Articles not to have a _summary attribute.

Why? If metadata["summary"] exists, what's the benefit of potentially not having _summary?

PS: I really dislike the way .summary/._summary or .content/._content is handled in general, especially with plugin interactions. It is clunky, way too open to gotchas and generally makes things more tricky than it is supposed to be. And most of it is for backwards compatibility. I want to rework that at some point, but making it "clean" would need to break a bunch plugins related to that. Maybe for 5.0... /on-topic-off-pr-rant

@MinchinWeb
Copy link
Contributor Author

MinchinWeb commented Jan 22, 2024

Also allows Articles not to have a _summary attribute.

Why? If metadata["summary"] exists, what's the benefit of potentially not having _summary?

Backward compatibility, and it stayed because it was late last night and I didn't want to have to (i.e. too tired to) work through the implications of removing it. And, for now, the docs of this warning:

Avoid content_object_init signal if you intend to read summary or content properties of the content object. That combination can result in unresolved links when Linking to internal content (see pelican-plugins bug #314). Use _summary and _content properties instead, or, alternatively, run your plugin at a later stage (e.g. all_generators_finalized).

But if you're okay with it, I would be happy remove any reference to it, update the documentation, make sure my plugin is up to date, and try and stick a big enough warning in the release notes.


PS: I really dislike the way .summary/._summary or .content/._content is handled in general, especially with plugin interactions. It is clunky, way too open to gotchas and generally makes things more tricky than it is supposed to be. And most of it is for backwards compatibility. I want to rework that at some point, but making it "clean" would need to break a bunch plugins related to that. Maybe for 5.0... /on-topic-off-pr-rant

I've been bitten by this too! If seems rather esoteric and non-obvious, and really only something that's discovered through trial and error. The related part of that is that if you call .content, it caches the result, which means that you (or any other plugin) can't actually update it... If you were to change how it all works, I would be happy to work through updating any of my plugins that touch it!

@MinchinWeb MinchinWeb changed the title Resolve inter-site links in summaries. Resolve intrasite links in summaries. Jan 22, 2024
@avaris
Copy link
Member

avaris commented Jan 22, 2024

Backward compatibility, and it stayed because it was late last night and I didn't want to have to (i.e. too tired to) work through the implications of removing it.

But you did remove it. In refresh_metadata_intersite_links, previously ._summary was always set as long as metadata['summary'] was there. Now you are only setting ._summary if and only if there was a ._summary before. I'm asking why you removed it? :). Or rather you opened up to the possibility to have this odd edge case where you can have metadata['summary'] but not ._summary

Comment on lines 523 to 533
if "summary" in self.settings["FORMATTED_FIELDS"]:
if hasattr(self, "_summary"):
self.metadata["summary"] = self._summary

if "summary" in self.metadata:
self.metadata["summary"] = self._update_content(
self.metadata["summary"], self.get_siteurl()
)

if hasattr(self, "_summary") and "summary" in self.metadata:
self._summary = self.metadata["summary"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be precise, I'm talking about the last if condition here, specifically hasattr(self, "_summary"). If you compare to the previous version, this if should not be here and their contents should be combined to ensure that ._summary and metadata['summary'] are synced:

            if "summary" in self.metadata:
                self.metadata["summary"] = self._update_content(
                    self.metadata["summary"], self.get_siteurl()
                )
                self._summary = self.metadata["summary"]

@MinchinWeb
Copy link
Contributor Author

previously ._summary was always set as long as metadata['summary'] was there.

erh, not quite. The line before was this:

self._summary = self._update_content(self._summary, self.get_siteurl())

so if you got into the function without ._summary, it would just error out with an AttributeError, and metadata['summary'] wouldn't get set because you had error-ed out (and probably crashed Pelican in the process). Previously, there was not guard to confirm that there was a ._summary attribute.

An implicit assumption has become explicit.


Is the (preferred) solution to remove the last if condition and just always set ._summary?

@avaris
Copy link
Member

avaris commented Jan 23, 2024

Sorry, bad choice of words on my part. Previous version had its issues. What I was trying to say is that if you update metadata['summary'], update ._summary as well. Having different conditions to both can lead to those being out of sync. i.e. If the original content did not have summary (no summary metadata), but afterwards some plugin set metadata['summary'] after the end of this block, metadata['summary'] will be correct but there will not be any ._summary attribute.

Is the (preferred) solution to remove the last if condition and just always set ._summary?

Yes, As I wrote above, just update both together:

            if "summary" in self.metadata:
                self.metadata["summary"] = self._update_content(
                    self.metadata["summary"], self.get_siteurl()
                )
                self._summary = self.metadata["summary"]

@MinchinWeb
Copy link
Contributor Author

update metadata['summary'], update ._summary as well.

Done!

Thank you for your patience as I get this figured out, and thanks for all your work with Pelican!

Copy link
Member

@avaris avaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@justinmayer justinmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks to @MinchinWeb for the fix and for the excellent attention to detail, made all the more robust thanks to the review by @avaris 🤗

@justinmayer justinmayer merged commit 960aee5 into getpelican:master Jan 29, 2024
15 checks passed
@MinchinWeb MinchinWeb deleted the summary-links branch January 29, 2024 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Internal URLs not parsed in summaries {filename} from URL leaks from summary of the very first post
3 participants