Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Routing] Trailing slash redirection prevents routes to be matched #32996

Closed
Jean-Gian opened this issue Aug 6, 2019 · 16 comments
Closed

[Routing] Trailing slash redirection prevents routes to be matched #32996

Jean-Gian opened this issue Aug 6, 2019 · 16 comments
Labels

Comments

@Jean-Gian
Copy link

Symfony version(s) affected: 4.3.3

Description
When using yaml to configure routes, if I have 2 routes that only differs on the trailing slash (/foo and /foo/) only the first one in the yaml file works, the second one always redirect to the first one.
This happens with hard coded routes (/foo) or with wildcards (/{param})
The documentation says this should not happen: https://symfony.com/doc/current/routing.html#routing-trailing-slash-redirection

How to reproduce
The routing yaml file is:

foo_slash: #working
  path:       /foo/
  controller: App\Controller\Slash:foo
foo_no_slash: #not working, redirecting to foo_slash
  path:       /foo
  controller: App\Controller\NoSlash:foo
bar_no_slash: #working
  path:       /bar
  controller: App\Controller\NoSlash:bar
bar_slash: #not working, redirecting to bar_no_slash
  path:       /bar/
  controller: App\Controller\Slash:bar
@nicolas-grekas
Copy link
Member

It looks like the doc needs to be updated!

@Jean-Gian
Copy link
Author

Hi @nicolas-grekas
So is this supposed to be the correct behaviour? And the documentation is not up to date?

@nicolas-grekas
Copy link
Member

Yes, that's correct.

@xabbuh
Copy link
Member

xabbuh commented Aug 6, 2019

closing here then

@Jean-ita please open an issue in the documentation repository if there is something to improve, thanks

@xabbuh xabbuh closed this as completed Aug 6, 2019
@Tobion Tobion reopened this Aug 6, 2019
@Tobion
Copy link
Member

Tobion commented Aug 6, 2019

When was that change introduced? I thought the redirection only happens when no other route matches.

@javiereguiluz
Copy link
Member

I agree with @Tobion. Having /foo and /foo/ worked in the past as separate URLs (see also #30013). I thought the docs were right.

@Jean-Gian
Copy link
Author

I think it's useful to have the possibility to have /foo and /foo/ as separate URLs.
Especially if we think about wildcards: /{param} and /{param}/

@Jean-Gian
Copy link
Author

@Tobion , @javiereguiluz and everyone else: don't you think that this should be at least configurable?
The current behaviour should be on by default, but it should be possible to turn it off

Otherwise we run into bc issues when migrating from older version and also into SEO issue, considering /foo and /foo/ are different pages in terms of SEO

@xavierleune
Copy link
Contributor

Hi,
There is 2 things here:

  • /foo and /foo/ should be valid routes and we should not have a redirect from one to the other if both are defined. This seems to be a bug, I'll start working on a fix for sf 3.4 on that issue.
  • The trailing slash behavior redirect should be optionnal, I agree with @Jean-ita when he says that there is a SEO issue here and kind of a BC break. We can start talking about a way to add an option to disable this feature if you're open to it.

Thanks

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Aug 27, 2019

Two parts in this comment:

1.

I've reopened the code to figure out the behavior I implemented/fixed last year regarding trailing slashes. TL;DR, I think there is no bug despite the recent comments.

The behavior can be described has such: trailing slash redirections have a higher priority than route ordering. There is one exception: when a variable is put at the end of a route, and when this variable has a requirement that makes a trailing slash a MUST or a DON'T, then the redirection happens last when applicable (see part 2. below).

The behavior dates back to the origin of the components, you can verify: the PhpMatcherDumper::compileRoute() method in v2.0.0 (that's from 2011) generates code that triggers the redirection logic with this order of priority.

For static routes, this was not the case and route ordering could apply. That's surely why the documentation states this (obsolete) behavior. But this was inconsistent (why would priority of redirection change when adding a variable?!?), and is now fixed.

The behavior is the result of several expectations. One example is from you @javiereguiluz, and critical for symfony.com, if you remember this issue (#29287): when a route for /blog/ is defined, and when one for /blog/{slug} is defined too, where shoud /blog go? It should redirect to /blog/, at least for BC, since it has always have. Yet, /blog/{slug} compiles to something like /blog/?(.*), so it matches /blog. You see: if we stick to the simple descriptions everyone can have from their pov, we're going to be like in the story with blind people describing an elephant, each from a different part of it (I'm telling this because that's how I felt when I worked on this :))

In my opinion, that elephant is now properly modelled with the behavior we have, all grasped in test cases that link back to issues that explain how this should behave in edge cases too, reported as BC breaks when some changes broke the corresponding use cases. I wish nobody else to spend the time I've burned on this. Note that I'm not saying another model doesn't exist. I'm only saying that the problem is harder than what you think, and I consider it solved personnaly. Could be because I'm lazy, you'll tell, show me the code :)

2.

Back to the topic:

How can one have better control of the trailing slash redirections? First of all, a global flag would be inappropriate to me. Global behavior changes might break expectations of e.g. third party bundles (same reasoning as why php.ini settings are bad for global behavior control.)

As I mentionned earlier, it is possible to force a route to NOT match with (or without) a trailing slash. Here is how (the underscore could be replaced by any random var name):

  • @Route("/hello{_</(?!/)>}", name="hello1")
  • @Route("/hello{_<(?!/)>}", name="hello2")

WIth these two routes, /hello/ will go to route hello1, and /hello will go to route hello2. Problem solved.

I agree this is not very user friendly but you can use it immediately if you need to, it won't have any performance overhead. We could also imagine a way for the router to generate this for us. I don't know precisely how, but this should still give a path forward if anyone wants to invest time in improving this. Could be "just" documented also.

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Aug 27, 2019

Note that there is another way to describe the behavior of the router regarding trailing slashes: routes with and without trailing slashes are considered canonically equivalent.
(If this doesn't fit with your requirements, which is what this issue is about, see section 2. above)

@Tobion
Copy link
Member

Tobion commented Aug 27, 2019

To me the implementation we had at one point (#29287) to only do the redirect when nothing else matches was really simple and really easy to understand and would also be really simple to disable if people wanted. Now it's complex with alot of edge cases to cover old behavior. But the bc breaks are still there, just in a different place, as pointed out in

For static routes, this was not the case and route ordering could apply.

And this explains all the issues that have been opened about the trailing slashes.
As I see it, trailing slash redirection should be a user-friendly feature as a fallback. Now it's baked into the system and standing in the way in unexpected manners (as you can't just define both routes like this issue is about).

But the current behavior is done and changing it would maker other people unhappy again. So let's just try to live with it and document it.

@Tobion Tobion closed this as completed Aug 27, 2019
@xavierleune
Copy link
Contributor

Thanks @nicolas-grekas and @Tobion for your feedbacks. I understand your point about BC here. However I will double check that this behavior is fully consistent across implementations (I had a case lately where behavior was slightly different between dev and prod regarding trailing slashes, I'm not sure if it's fully fixed with #33244).

However I disagree about the global flag. A 3rd party bundle should not rely on this trailing slash redirection. It already cannot rely on the generated path for a route, so it cannot rely on this trailing slash management. And it should not.
Honestly the whole behavior regarding trailing slashes feels like a bad idea to me. The framework should not make any redirect until I want it explicitly. Redirection are very sensitive responses when it comes to SEO (it can ruin it or boost it). I really think a global flag is the appropriate way to handle it, I don't want 3rd party bundle to rely on an "automatic typo fix in request path".
What legitimate expectation of 3rd party bundle could be broken in your opinion by a global flag ?

@nicolas-grekas
Copy link
Member

I don't understand the issue with trailing slashes: the generator always creates URLs that use the canonical version. If you define /foo, nobody will ever reference /foo/, so why all this stress on a URL that nobody will hit?
Then, let's imagine someone still hits /foo/. Symfony can generate a 30x or a 404. 30x is more useful, so it's the most appropriate.
Where do I miss the point?

@xavierleune
Copy link
Contributor

xavierleune commented Aug 28, 2019

The real issue with SEO is that it's slightly different to have a 404 page changed to a 200 (when I create a matching route), than a 301 changing to a 200. Google poorly handle 30X, especially on news or AMP 3rd party cache.
So to summarize the issue, it's really applicable when a 3rd party site creates a "bad" link (with a trailing slash when it should not be here). With the current setup the search engines will discover this url and keep it in memory, because it does exists as a 30X. However it should not exists, it's really a 404. The problem really shows up when you change things in your routing.
Example:

  • Day 1:
    • /foo exists in routing and is a valid page
    • /foo/ is automatically redirected to /foo with a 301 code
  • Day 2:
    • I delete the route /foo, I create instead /foo/ which is a totally different page
    • What I want: send a 410 on /foo (it used to exist but does not exists anymore) and just have a 200 on /foo/
    • What I have: /foo is now redirected to /foo/.
    • For 3rd party search engines /foo/ already exists, it's a redirection to /foo. So they will consider for a time (sometimes a long time) it's still valid. They will check only later if /foo is still a valid page. It can mess things up, making it really hard to debug.

This example is pretty simple, but things are more complicated when you add placeholders. I cannot share much details here, but a similar issue caused a massive SEO bug on a very important day lately. The router was not the issue, but similar assumptions have been made and search engines were basically lost.

"nobody will ever reference /foo/, so why all this stress on a URL that nobody will hit?"

Because it will. Somebody will hit this url one day or another. Probably sooner that you think. We have a lot of 404 every day from url which never existed. If it comes from chrome, you can be sure that googlebot will come and check on it.

In conclusion, you summarize what the framework should not do, there is no good reason to do that: "routes with and without trailing slashes are considered canonically equivalent". Routes with and without trailing slashes are different routes.
I really understand the BC consideration. People can rely on this without knowing it. And I understand why this can feels like helping and not harmful.
As @Tobion said, there is a lot of issues on trailing slashes, and a better documentation can definitely help.
But when you really care about SEO you should be able to totally remove this behavior from your app with a global flag. Because again, there is no real good reason to do that kind of things and a redirection should never be a default choice.

@xavierleune
Copy link
Contributor

Please continue the discussion on #33342

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants