Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide shadow articles in Sitemap + Canonical Link Tag #7288

Open
arjanfrans opened this issue Feb 19, 2024 · 6 comments
Open

Hide shadow articles in Sitemap + Canonical Link Tag #7288

arjanfrans opened this issue Feb 19, 2024 · 6 comments
Labels
Bug Error or unexpected behavior of already existing functionality To Discuss The core team has to decide if this will be implemented

Comments

@arjanfrans
Copy link

Q A
Sulu Version 2.5.7
PHP Version 8.1
DB Version Postgres 10.14
Browser Version Chrome 121.0.6167.184

Actual Behavior

We have blogposts that are not translated, but we do have two localse ("de" and "en). We still want the "de" blogpost to be accessible on the "en" Website. For this we configured the "en" version as a Shadow Article.

The "en" version is considered the same as "de" and should NOT show up in the canonical, instead it should just have the "de" as a canonical.

image

I was able to solve this by overriding sulu/vendor/sulu/sulu/src/Sulu/Bundle/WebsiteBundle/Resources/views/Extension/seo.html.twig.

{%- block urls -%}
    {# when only one language do not show alternate #}
    {# if a url is dynamic with query strings, do not add alternate #}
    {# if a url has a shadow page with the same content, do not add alternate #}
    {%- if localizations|length > 1 -%}
        {%- for localization in localizations -%}
            {%- if app.request.query.count == 0 -%}
                {%- if  shadowBaseLocale != localization.locale and (localization.locale not in shadowLocales and metaLocale not in shadowLocales) -%}
                    {%- if (localization.alternate is not defined or localization.alternate) and shadowLocales|length == 0 -%}
                        <link rel="alternate" href="{{ localization.url }}"
                              hreflang="{{ localization.locale|replace({'_': '-'}) }}">
                    {%- endif -%}
                {%- endif -%}
            {%- endif -%}
        {%- endfor -%}
    {%- endif -%}
{%- endblock -%}

The "en" Shadow Article should also not show up in the sitemap.xml. I was not able to easily solve this without overriding a bunch of services.

"Deleting" the "en" version is not a solution, since then it won't be accesible on the "en" version of the website.

Expected Behavior

Do not show Shadow Articles as a canonical alternate / use the original language as a canonical+alternate.

If I have an article with "de" and "en" versions, and the "en" version is a shadow page. I expect the following link tags.

On https://www.fusonic.net/de/blog/machine-learning-vorhersage:

<link rel="alternate" href="https://www.fusonic.net/de/blog/machine-learning-vorhersage" hreflang="de">
<link rel="canonical" href="https://www.fusonic.net/en/blog/machine-learning-vorhersage">

On https://www.fusonic.net/en/blog/machine-learning-vorhersage:

<link rel="canonical" href="https://www.fusonic.net/en/blog/machine-learning-vorhersage">

The behavior if the "en" actually has a translated version is correct.

Steps to Reproduce

Create an Article in two languages, and configure one as a shadow article.

Possible Solutions

As mentioned, I was able to solve the canonical meta tag myself by overriding the template.

Regarding the Article in the sitemap; since SULU already has the option to Hide an Article from the Sitemap, it should actually be easily solvable. If an article translation is a Shadow Article -> automatically toggle 'Hide in Sitemap'.

@arjanfrans arjanfrans added the Bug Error or unexpected behavior of already existing functionality label Feb 19, 2024
@arjanfrans
Copy link
Author

Regarding the comment in the modified code about the query strings. Could be that it is specific to our use case, but in general it's not good for SEO that the query string is included. I modified another part in the seo.html.twig template to also strip the query string from the canonical:

{%- block canonical -%}
    <link rel="canonical" href="{{ seoCanonical|default(app.request.uri)|split('?')[0] }}">
{%- endblock -%}

This is also easily solvable in the sitemap template, however it would be nice to also be able to configure this.

@arjanfrans
Copy link
Author

arjanfrans commented Mar 27, 2024

Regarding the sitemap issue, I tried patching it myself with something like this:

image

But somehow it does not get saved. Does the 'seo' part automatically get removed if shadow page is enabled? Where does that happen?

@alexander-schranz
Copy link
Member

I'm not sure if it was a good decision to create automatically a canonical when a shadow is defined. Because google should still in my case index a shadow page if example en-US is a shadow page from en I think the en-US should still being indexed for en-US users.

And so I think we should remove maybe this line from the seo.html.twig:

{% set seoCanonical = localizations[shadowBaseLocale].url %}

@chirimoya what do you think?

@alexander-schranz alexander-schranz added the To Discuss The core team has to decide if this will be implemented label May 3, 2024
@chirimoya
Copy link
Member

@alexander-schranz good question. In the case of en-US is a shadow of en I would agree, but how about de fallback to en? I guess we need to do some research and/or ask a SEO expert.

@alexander-schranz
Copy link
Member

Sadly could not yet find official google webmaster example about such cases. weglot does list example for example.com (en) / example.com/gb (en-GB) and suggest not todo canonicals there: https://www.weglot.com/blog/hreflang-canonical#2-setting-your-global-page-url-as-the-canonical-url. There is example here https://www.portent.com/blog/seo/implement-hreflang-canonical-tags-correctly.htm targeting en <-> de but not saying if the content is the same.

In some blog posts like here it is mention if there is a canonical there should be no hreflang example here (https://translatepress.com/hreflang-canonical/), but the offical google webmaster docs mention in the canonical docs to also add all other relevant links like hreflang: https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls?hl=de 🤔 .

See it specially from the homepage example you normally undependently if it is a shadow index both and not create a canonical. Andso the example.com and example.com/de site, are both indexed and via hreflang google knows which user show which site.

Based on comments google decides by its own if a hreflang is a canonical indexed or not (https://support.google.com/webmasters/thread/72181388/duplicate-content-linking-concerns-between-com-and-co-uk-sites?hl=en) and decides then show the correct url based on the users locale.

@arjanfrans
Copy link
Author

arjanfrans commented May 22, 2024

Our main problem is:

If example.com/en/blogpost-1 is a shadow page of example.com/de/blogpost-1 I want it to be accessible on my site. For both languages.

But for anyone coming in through a search, I want them to land on the original page. Google should think that only the original page exists.

(We were also told by an SEO Expert regarding the canconical and sitemap, exactly as I described. I myself am also having some trouble finding sources for all this 🤔 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error or unexpected behavior of already existing functionality To Discuss The core team has to decide if this will be implemented
Projects
None yet
Development

No branches or pull requests

3 participants