Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider escaping selectors #337

Open
brendankenny opened this issue Mar 29, 2023 · 5 comments
Open

Consider escaping selectors #337

brendankenny opened this issue Mar 29, 2023 · 5 comments

Comments

@brendankenny
Copy link
Member

web.dev has a md:hidden-yes CSS class it likes to use these days, which won't work in a selector when you go back to debug using attribution data (e.g. document.querySelector('button.md:hidden-yes') -> Uncaught DOMException: Failed to execute 'querySelector' on 'Document': 'button.md:hidden-yes' is not a valid selector). Illegal characters have to be escaped to function in a selector (so e.g. 'button.md\\:hidden-yes' or button.md\\3Ahidden-yes).

Ideally these characters would be automatically escaped to ease the debugging process. This could be done on the report side, but not every viewer will remember to do this.

In the browser, you can use CSS.escape(), so if getSelector.ts is only ever used in the browser (I believe so since no jsdom?), el.className.trim().replace(/\s+/g, '.') could become [...el.classList].map(CSS.escape).join('.') and call it a day (though probably worth escaping the id/name as well).

@tunetheweb
Copy link
Member

Alternatively couldn't we just escape it when you run your querySelector?

document.querySelector(CSS.escape('button.md:hidden-yes'))

Just worried about that escaped sequence causing more confusion...

@brendankenny
Copy link
Member Author

You have to escape the individual parts, otherwise it'll escape the #s and .s as well (CSS.escape('button.md:hidden-yes') gives button\\.md\\:hidden-yes).

It's possible the escaped sequence could cause more confusion, looking for something with class md\\:hidden-yes vs what you authored (md:hidden-yes), but really the selectors you get are more like div.cluster.gutter-base>button.icon-button.tooltip.color-core-text.md:hidden-yes>svg. The most natural thing to do with that is select for the whole thing in the page, not look for individual parts of the selector, and if you do go looking for the individual parts, the \\ is still a pretty good indicator that something has been escaped and may have been altered.

Also, this is just to handle the "illegal" characters, mostly characters used to indicate other things in selectors, like : is usually accessing a pseudo class/element, , >, #, etc. Most of the time these characters won't end up in the middle of an id or class name so most will never be affected.

@philipwalton
Copy link
Member

philipwalton commented Apr 18, 2023

Yeah, I remember encountering this exact problem when trying to debug INP issues on web.dev myself, and it stumped me for a bit until I realized the issue with invalid selector. So, at minimum, I think it warrants a mention in the attribution docs because it could potentially be stumping other people as well.

However, I'm hesitant to change the default behavior of the library for a few reasons:

  1. The primary use case of these selectors is to help people visually identify potentially-problematic elements from glancing at a report (grouped based on page URL + selector), and escaped selectors are harder to visually identify (and potentially more confusing) than non-escaped ones. The primary use case was not to make it possible to build automated tooling that would identify these elements on a particular page for site developers. The reason the latter was not a primary use case is (AFAIK) It's not possible to produce a selector that is guaranteed to be unique or match the element in question, so there will always be cases where tooling will fail.

  2. If we were to make this change, I think it would have to be a breaking change because it could lead to a situation where a site owner's analytics data changes without them making any changes to their site. It could also lead to issues with a report's GROUP BY logic because the identifiers would be different before vs. after the update.

My sense is we should do what we can to help people (and tooling authors) avoid this problem by making it clear in the docs, but I'm hesitant change the library due to how it could potentially affect people's analytics data.

@brendankenny
Copy link
Member Author

brendankenny commented Apr 24, 2023

OK, but if it separately stumped you, me, and @mmocny for a bit, clearly searching in the page is a use case and it's going to cause issues when it happens :)

FWIW I didn't think to search in the web-vitals.js docs to see what was going on—I could see the : in the selector and the : in the class name in the DOM, so I thought the fact that it wasn't matching meant I was somehow holding querySelector wrong and went searching based on that, not how the selector was generated—so even if it had been documented here I wouldn't have found it.

Also remember this isn't all selectors class names, this is just class names with weird whitespace or characters like { in them, so in theory it won't be a common occurrence. The HTTP Archive query to figure out how often that occurs was going to be like $80 so I didn't run it, but maybe @tunetheweb has a spare set of CSS selectors around that we could check.

The point about analytics grouping over time is a good one, so I think it's just a question of weighing the downsides.

My biased viewpoint:

  • for the subset of devs tracking their attribution over time, have disallowed characters in their selectors, and have relatively stable page layout so aren't churning their own data: you'd still likely have a horizon of six months to a year where you'd care about relative rankings of selectors in your analytics data. Any breaking change adding escaping would cause pain over a similar timescale
  • if escaping is never added, the (in theory larger) subset of devs looking at attribution with disallowed characters in their selectors will run into this hiccup forever

Breaking changes are bad and we should feel bad, but our advice is to go from RUM attribution -> recreate in the lab, so to me at least it feels like we should just rip off the bandaid (for a library steadily growing in usage, the best time to make a breaking change was before you shipped, the second best time is as soon as possible :)

@brendankenny
Copy link
Member Author

Happy to discuss more. Don't worry about it somehow shipping before you're back @philipwalton :)

We could pair the change with making the CSS selector unique by using :nth-child() when it can't disambiguate between siblings using className/id, similar to how the DevTools Elements panel Copy->Copy selector works, though that doesn't eliminate the problem of a selector that only works when the page is in a particular state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants