Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(link-text): localize link text audit #14927

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

raffaelj
Copy link

@raffaelj raffaelj commented Mar 24, 2023

Summary

The link-text audit produces false positives, because it is not localized. I changed the anchor-elements gatherer and the link-text audit to check for the current language.

Related Issues/PRs

Fixes #14845

@raffaelj raffaelj requested a review from a team as a code owner March 24, 2023 18:49
@raffaelj raffaelj requested review from connorjclark and removed request for a team March 24, 2023 18:49
@google-cla
Copy link

google-cla bot commented Mar 24, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

no functional change, but less if/else soup and more descriptive
function name
if opening html tag is the only place with a lang attribute
@@ -5,7 +5,7 @@
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-->

<html>
<html lang="en">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to update this smoke test to include cases that exercise the new logic (getLangOfInnerText). The expectations for this smoke tests are found in seo-failing.js. If the current smoke expectations were more explicit, it would have this for the link-text audit (currently it just checks for 4 failures, but it'd be good to be more explicit now):

'link-text': {
  score: 0,
  displayValue: '4 links found',
  details: {
    items: [
      {text: 'click this'},
      {text: 'click this'},
      {text: 'CLICK THIS'},
      {text: 'click this'},
    ],
  },
}

Can you devise some cases here to make sure getLangOfInnerText works as expected? If you need to create a new file HTML in fixtures/seo and a new accompanying expectation file in test-definitions/seo-mixed-language.js, that's OK.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the smoke test seo-failing to be more explicit, including the language attribute and a German example. I also added a new smoke test seo-mixed-language with many edge cases (German and English examples).

const langElements = getElementsInDocument('[lang]'); // eslint-disable-line no-undef
const docHasOnlyHtmlLangAttr = (langElements.length === 1) &&
langElements[0].nodeName === 'HTML';
const lang = !docHasOnlyHtmlLangAttr ? null : langElements[0].getAttribute('lang');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be normalized some how. Consider the valid language code en-US.

I don't recall how we could actually normalize to the base language code. In shared/localization/locales.js we form a mapping explicitly. I suppose here we could just split by - and return the first part to use as the loclae.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't normalize here. The gatherer just collects the lang code and passes it to the audit. If it should get normalized, then the last tool in the chain (the audit) should do that.

See also my response below to @adamraine about keeping the loop over language code segments.

@@ -5,7 +5,7 @@
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-->

<html>
<html lang="en">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uneasy how this is now required for the audit to function at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this, and we think it'd be best to do the current behavior of the audit when the document does not declare a language. I think that's what the PR is currently doing, but some tests in the audit would be good to verify.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure - What do you mean with "current behavior"? The current behavior before this PR or the current behavior of this PR?

My implementation ignores links if the language is not declared. I also added a new test case to core/test/audits/seo/link-text-test.js to reflect that new behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure - What do you mean with "current behavior"? The current behavior before this PR or the current behavior of this PR?

The current behavior before this PR is what we want to fall back to if a lang is not specified.

@raffaelj
Copy link
Author

raffaelj commented May 8, 2023

@connorjclark @adamraine Thanks for the review. The easy ones are fixed. I wrote a smoke test, but I don't have more time today. I'll do some cleanup and push the changes tomorrow.

I'll have a look at the other suggestions this week (or weekend).

@DFWCowboys

This comment was marked as spam.

@@ -10,15 +10,15 @@ import LinkTextAudit from '../../../audits/seo/link-text.js';

describe('SEO: link text audit', () => {
it('fails when link with non descriptive text is found', () => {
const invalidLink = {href: 'https://example.com/otherpage.html', text: 'click here', rel: ''};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a new test or two that has non-english text that is bad would be clearer. All these existing test cases could leave off lang: 'en' since it doesn't modify the thing they are asserting; plus that gives us coverage for the common case of there being no document language.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned it up and added more tests.

@raffaelj
Copy link
Author

I think, I addressed all annotations.

If you really want to change the audit to use only the first segment of a language code (e. g. en from en-US), I could do it. But I still think, it's better to keep my current implementation.

A few questions are still open from #14845 (comment):

  • headline "Fallbacks"
    • fallback to pragma-set-default-language or HTTP header
  • headline "Other thoughts"
    • more exceptions
    • change weight of audit

@raffaelj raffaelj changed the title Localize link text audit core(link-text): localize link text audit May 11, 2023
ckopoctb999

This comment was marked as spam.

@raffaelj
Copy link
Author

raffaelj commented Jun 6, 2023

@connorjclark @adamraine Can I do anything to finish this PR?

Copy link
Member

@adamraine adamraine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @raffaelj, and thanks for being patient

@@ -5,7 +5,7 @@
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-->

<html>
<html lang="en">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure - What do you mean with "current behavior"? The current behavior before this PR or the current behavior of this PR?

The current behavior before this PR is what we want to fall back to if a lang is not specified.

cli/test/smokehouse/test-definitions/seo-mixed-language.js Outdated Show resolved Hide resolved
* @param {string|null} currentLang
* @return {string}
*/
function getLangOfInnerText(node, currentLang = null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function adds a lot of complexity just so we can check the edge case where a child of the anchor defines a new language. We could get most of the benefit if we just check for node.closest('[lang]')?.getAttribute('lang') || '' and set the lang to unknown ('') if any child nodes define a lang. If the language is unknown we just check every language like Connor mentioned.

I think the slightly worse experience for those edge cases is preferable to introducing all of this complexity.

nonDescriptiveLinkTexts[lang].has(searchTerm)) {
return true;
}
lang = lang.substring(0, lang.lastIndexOf('-'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-proofing sounds fine 👍

@raffaelj
Copy link
Author

Sorry for the delay, and thanks for being patient

Now I have to add another delay. I see 10 changed files from 3-6 months ago and I have to read them all again before I can answer. I'm not sure, if I find some time during the next weeks. If someone else wants to take over to speed things up - just go ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Descriptive link text audit doesn't make sense without localization
6 participants