Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block search engines from indexing the head version #231

Closed
mdjermanovic opened this issue Jun 16, 2022 · 6 comments
Closed

Block search engines from indexing the head version #231

mdjermanovic opened this issue Jun 16, 2022 · 6 comments
Labels
Projects

Comments

@mdjermanovic
Copy link
Member

We want the pages under https://eslint.org/docs/latest/ to appear in search results, so we should probably block the pages under https://eslint.org/docs/head/ from indexing.

@eslint-github-bot eslint-github-bot bot added this to Needs Triage in Triage Jun 16, 2022
mdjermanovic added a commit to eslint/eslint that referenced this issue Jun 17, 2022
@mdjermanovic
Copy link
Member Author

Per https://developers.google.com/search/docs/advanced/robots/intro robots.txt is not the place to do this. In fact, per https://developers.google.com/search/docs/advanced/crawling/block-indexing the pages should not be blocked from crawling.

We could set up X-Robots-Tag: noindex response header and/or add <meta name="robots" content="noindex"> tags.

I prepared eslint/eslint#16016 if we opt for the tag approach.

@nzakas
Copy link
Member

nzakas commented Jun 18, 2022

I think a better approach might be to just set the canonical URL of pages in docs/head to be docs/latest. That way, Google will index the content but will consider it another version of the latest docs, which I think is the correct interpretation.

@mdjermanovic
Copy link
Member Author

I tried googling SEO best practices regarding docs for multiple versions of same product, but couldn't find anything conclusive. I think of pages with same canonical URL as pages that provide essentially same information, either exact same or even reduced or enhanced. Docs for different versions of a software package could provide substantially different information.

Either way, regardless of my theoretical interpretation of canonical URLs, the solution you proposed seems to work well on https://sinonjs.org/ so I'm 👍 for trying that approach.

@mdjermanovic
Copy link
Member Author

But, we'll have a situation where some pages exist only in the head version for a certain period of time. For example, when we merge a new rule. At that point, canonical URL that points to the latest version will return 404, which doesn't seem right.

@nzakas
Copy link
Member

nzakas commented Jun 20, 2022

While that can happen, that’s an edge case, and one that will be resolved within two weeks each time it occurs. So I think it’s a safe way to start for now. We can always adjust things if we find they aren’t working correctly down the line.

mdjermanovic added a commit to eslint/eslint that referenced this issue Jun 21, 2022
@mdjermanovic
Copy link
Member Author

As agreed in eslint/eslint#16016, this issue is fixed by adding noindex meta tags.

Triage automation moved this from Needs Triage to Complete Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Triage
Complete
Development

No branches or pull requests

2 participants