Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce treatAsNonHtml to remove isHTML implementation #1230

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

edwinv
Copy link
Contributor

@edwinv edwinv commented Mar 25, 2024

Fixes #608

When Turbo navigates to a new URL, locationIsVisitable is called to check if the location is visitable. One part of the check is checking if the requested URL returns HTML. We can never determine with 100% confidence if a request will return HTML. A page with a .html extension might still return another content-type. A page with a .foobar extension might return HTML. Using a check on the file extension is therefore imprecise and prone to bugs.

Turbo already handles non-HTML responses. When a response doesn't contain an HTML part, a contentTypeMismatch is raised. This causes the page to reload to the requested URL. So we could remove the isHTML check completely and still have a functioning situation. The major downside is having two requests: one Turbo fetch and one regular browser request.

A user could prevent these double requests by adding data-turbo="false" to these links. But there might be some sensible default by never handling known non-HTML extensions as a Turbo request. This PR introduces the Turbo.treatAsNonHtml setting to enable this.

By default, a range of common non-HTML extensions are included in this setting, like .png and .jpg. If users have many links to other extensions and don't want to sprinkle their HTML with data-turbo="false" attribute, they can add new extensions to the set:

Turbo.treatAsNonHtml.add(".mp3")

Because these changes are added to the lower-level checks, they work both for regulars visits and the newly introduced page refresh morphs.

Mentioning @packagethief @dhh @jorgemanrubia for some visibility.

@packagethief
Copy link
Contributor

packagethief commented Mar 25, 2024

I'm wary of using an exclusion list and assuming that most things are HTML by default. I think the list will just keep growing. But I do like the API you've proposed better than isHTML's existing regex. What would you think of using that API with an HTML extension allowlist? Then applications can more easily configure the extensions they want treated as HTML:

Turbo.htmlExtensions.add(".foobar")

The implementation would be as you've proposed, but would check for inclusion in the HTML extensions, rather than exclusion. We'd ship with the defaults already defined by the regex:

export const htmlExtensions = new Set([ ".html", ".htm", ".xhtml", ".php" ])

export function locationIsVisitable(location, rootLocation) {
  return isPrefixedBy(location, rootLocation) && htmlExtensions.has(getExtension(location))
}

EDIT: I guess that won't work when there's no extension – the implementation would need to account for that.

@edwinv
Copy link
Contributor Author

edwinv commented Mar 25, 2024

On the one hand finding the extension is always a educated guess. In our application we have URL's with a filter, containing a date range presentation. Example: /invoices/filter/period:202401..202403. This is a valid URL and is perfectly handled by Rails routing. Turbo thinks the extension is .202403. This is a dynamic part of the URL and can never be added to an htmlExtensions array on the frontend. Turbo thinks it is no HTML and will always reload the page. Up until recently this wasn't an issue: while navigating the user just got a fresh page. With Turbo streams refreshing/morphing the page, the user now faces a complete refresh while we are just trying to update a small part of it. This is degrading the user experience.

On the other hand, we need to think about fallback situations. If the treatAsNonHtml set is not covering all situations, the page will still load. It takes two requests, but the user won't notice. The major downside is wasting traffic and a bit of a delay for the user. Compare this to an incomplete htmlExtensions. Then the user will have a degraded experience due to full page reloads, while morphing is expected.

So, I'm still in favor of the proposed set for non-HTML extensions. I'm aware it is not covering all possible extensions, but when it does not cover an extension, the user won't notice and the experience stays the same.

I like the htmlExtensions name, so maybe we should rename treatAsNonHtml to a simpler name like nonHtmlExtensions or neverLoadWithTurbo.

@packagethief
Copy link
Contributor

So, I'm still in favor of the proposed set for non-HTML extensions. I'm aware it is not covering all possible extensions, but when it does not cover an extension, the user won't notice and the experience stays the same.

Thanks for the explanation. I hadn't considered dynamic paths like the one you described. I think your reasoning makes sense.

I think we'd need a much larger default exclusion list, however. Just looking at Basecamp and HEY, we have a pretty extensive list of extensions we'll auto-link to: zip, tar, gz, bz2, rar, 7z, dmg, exe, msi, pkg, deb, iso, bmp, mp4, mov, avi, mkv, wmv, heic, heif, mp3, wav, ogg, aac, wma, webm, ogv, mpg, mpeg. While these could return HTML, it's extremely unlikely, and we wouldn't want to start making two requests to resolve them.

@packagethief
Copy link
Contributor

Thinking about it more, I think there are more downsides to an exclusion list than the inclusion regex we have now. It makes more sense to define the extensions that we're confident are HTML than it does to define the ones that aren't. The former is going to be a much smaller set than the latter for all practical purposes.

The example with a pathname like /period:202401..202403 feels like an uncommon case, and one that could still be supported by augmenting the regex we have now. I'd be in favor of a nicer API that didn't involve replacing the regex though, like a way to define additional extension matchers.

@edwinv
Copy link
Contributor Author

edwinv commented Mar 25, 2024

For the inclusion list/regex to work, it should have perfect coverage. My situation is very specific, but there have been plenty of other examples like FQDN like example.com, Rails ids in paths, .deploy, file extensions, and usernames with a dot in it. I don't see how a default list/regex is going to cover all these situations.

If the list/regex is a configuration, a user could extend the list and cover their needs. But even then it is hard to get it covered. Many paths contain dynamic parts that are hard to contain in a list or regex.

But I'm also thinking about the developer experience. A new developer starting with Turbo expects Turbo to work all the time. When a part of the requests/refreshes are suddenly page reloads, it is not clear at first hand what is happening. This was my firsthand experience last few days when users complained about random refreshes and we couldn't find the cause because the . is only in specific situations in the URL.
Having a inclusion list/regex makes things more flexible for the developer, but also opens the door for new bugs. What if the regex matches all extensions by accident, then all of the sudden Turbo starts loading full PDF files or large binaries.

I don't see a downside for a long(er) exclusion list. We probably can find a reasonable list of say 100 most used file extensions on the web that give no HTML response. If we miss something, the end-user won't notice other than the double request. And the developer can safely extend the list by providing a specific extension.

What I was thinking about, is the introduction of the (first) Turbo configuration. Instead of going with the Turbo.someConfig route, we can also introduce a specific <meta> tag for the configuration. This is more in line with how other configurations for Turbo are made.

@brunoprietog
Copy link
Contributor

Is it too complicated to add a slash at the end? In Rails you can easily do it with the trailing_slash option, at least. It would automatically solve your problem.

@edwinv
Copy link
Contributor Author

edwinv commented Apr 5, 2024

Is it too complicated to add a slash at the end? In Rails you can easily do it with the trailing_slash option, at least. It would automatically solve your problem.

Yes, this has all kinds of side effect in our application. Furthermore I believe a library like Turbo shouldn't dictate if slashes are added or not. Rails might have an easy solution for this, but isn't this library intended to be used by other frameworks too?

@edwinv
Copy link
Contributor Author

edwinv commented May 13, 2024

How are we going to move forward with this PR? I haven't read any arguments that have convinced me the exclusion list is not going to work, other than some minor inconveniences with maintaining the list. In my point of view this downside does not outweigh the downside of unexpected behavior with full page refreshes due to some random . being interpreted as a file extension. I still consider detecting file types by their extension to be a code smell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Turbo will not navigate to URLs that have a . in the pathname
3 participants