Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(full_url_for): handle config.url with a trailing slash #217

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

curbengh
Copy link
Contributor

@curbengh curbengh commented Jul 1, 2020

a notable example is Github Pages with project page, e.g. https://github.com/OpenTechSchool/social-coding corresponds to https://opentechschool.github.io/social-coding/. Project page always have a trailing slash, accessing https://opentechschool.github.io/social-coding will 301-redirect to https://opentechschool.github.io/social-coding/. So, in this case, user may prefer to use url: https://opentechschool.github.io/social-coding/.

This PR adds support for that.

@curbengh curbengh requested a review from SukkaW July 1, 2020 08:43
@coveralls
Copy link

coveralls commented Jul 1, 2020

Coverage Status

Coverage increased (+0.005%) to 97.092% when pulling 6cd1494 on curbengh:full-url-for into 14fa817 on hexojs:master.

Copy link
Member

@SukkaW SukkaW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user may prefer to use url: https://opentechschool.github.io/social-coding/.

That's not a case. We already documented it clearly: https://hexo.io/docs/configuration

image

Also, Hexo already handle it in load_config.js as a failsafe:

https://github.com/hexojs/hexo/blob/f5eb2df69a4a0f7f0a32ab6abc44aa984a4da8c6/lib/hexo/load_config.js#L31-L32

  config.root = config.root.replace(/\/*$/, '/');
  config.url = config.url.replace(/\/+$/, '');

@stevenjoezhang
Copy link
Member

Yes, and there is a hint in the _config.yml

If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'

https://github.com/hexojs/hexo-starter/blob/259ecc3982e0445075f6c0daaccc24e4f1c7e663/_config.yml#L15

Which means

url: https://example.com/blog/

should be considered as invalid config.

See also hexojs/hexo#1812 (comment)

lib/full_url_for.js Outdated Show resolved Hide resolved
@curbengh
Copy link
Contributor Author

curbengh commented Jul 1, 2020

config.url = config.url.replace(//+$/, '');

that failsafe is not available in Hexo API (e.g. when use in a plugin's unit test).

It is also not ideal in use cases such as rss and sitemap. When a trailing slash (e.g. http://example.com/blog/) is the actual URL (no redirection), those use cases should use that URL; currently due to that failsafe (trailing slash is removed), web crawler may catch a 301 redirection (when crawling through a sitemap).

@SukkaW
Copy link
Member

SukkaW commented Jul 1, 2020

@curbengh

hexo.config is added by load_config, thus the failsafe is always there. Plugins‘ unit test? They should be written in valid config format (because we already transform it into the valid way during load_config). In case you are worrying about mocking test cases, use await hexo.init() && await hexo.load() (in which load_config will be called) will be fine.

Also, current behavior of full_url_for('/') will always resulted in hexo.config.url + '/' (see full_url_for's test cases). There is no need to worry about trailing slash issue.

ctx.config.url = 'https://example.com';
fullUrlFor('index.html').should.eql(ctx.config.url + '/index.html');
fullUrlFor('/').should.eql(ctx.config.url + '/');

Last but not least, all browsers, http request libs as well as crawlers will treat //example.com as //example.com/, and no 301 will be returned:

GET https://blog.skk.moe

> HTTP/1.1 200 OK
GET https://blog.skk.moe.

> HTTP/1.1 200 OK
GET https://blog.skk.moe/

> HTTP/1.1 200 OK

@curbengh
Copy link
Contributor Author

curbengh commented Jul 2, 2020

no 301 will be returned

I was referring to url: http://example.com/blog/ example. load_config.js removes the trailing slash, so requesting http://example.com/blog will be 301-ed to http://example.com/blog/ (in GH Pages case). feed currently does add it back, but for unrelated purpose (it was for config.url + path). This remove/add trailing slash operation seems un-intuitive to me.


## If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'
url: http://yoursite.com
root: /

My understanding of the rational behind http://yoursite.com/child limitation is that Hexo previously used config.url + path. After the availability of full_url_for(), that concat approach is no longer used (AFAIK) and we even discourage theme/plugin devs from using it. So, the limitation is no longer needed, as in it's now trivial to be flexible to existence of trailing slash.

In GH Pages, since http://yoursite.com/child/ is the actual URL, it's natural for newcomers to use that; it's only after they notice ## If your site is put in a subdirectory... that the trailing slash needs to be removed.

Hexo blog can be hosted anywhere, we can't know whether http://yoursite.com/child/ or http://yoursite.com/child is the actual one, a web server/hosting platform may prefer to have a trailing slash and vice versa. Since we can't know, why not let the user decides?

@curbengh
Copy link
Contributor Author

curbengh commented Jul 12, 2020

Fixed invalid protocol (https:/example.com/index.html) raised by @jiangtj


full_url_for('/') now simply returns config.url (note: this is different from previous behavior):

const hexo = {
  config: {
    url: 'http://example.com'
  }
}

console.log(full_url_for.call(hexo, '/'))
// http://example.com

hexo.config.url = 'http://example.com/'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/

hexo.config.url = 'http://example.com/blog'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/blog

hexo.config.url = 'http://example.com/blog/'
console.log(full_url_for.call(hexo, '/'))
// http://example.com/blog/

full_url_for('index.html') appends a trailing slash if config.pretty_urls.trailing_index === false (same as previous behavior):

const hexo = {
  config: {
    url: 'http://example.com'
  }
}

console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/

hexo.config.url = 'http://example.com/'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/

hexo.config.url = 'http://example.com/blog'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/blog/

hexo.config.url = 'http://example.com/blog/'
console.log(full_url_for.call(hexo, 'index.html'))
// http://example.com/blog/

});

it('internal url - subdirectory with trailing slash', () => {
ctx.config.url = 'https://example.com/blog/';
Copy link
Member

@SukkaW SukkaW Jul 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/hexojs/hexo/blob/17ee23fa181bcc2c3b47eba5b345762aef1eac36/test/scripts/hexo/load_config.js#L93-L106

Just as the test case of load_config shows, the url: https://hexo.io/ will result in hexo.config.url.should.eql('https://hexo.io');, while root: foo will result in hexo.config.root.should.eql('foo/');. Thus, the test case just here just won't exist.

Copy link
Contributor Author

@curbengh curbengh Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/hexojs/hexo/blob/17ee23fa181bcc2c3b47eba5b345762aef1eac36/lib/hexo/load_config.js#L34-L35 should be removed if this PR is accepted. Those two lines are considered a workaround and this PR already handles it.

@jiangtj
Copy link
Member

jiangtj commented Jul 13, 2020

In fact, I am confused about these two configurations.

url: http://yoursite.com  => root: /
url: http://yoursite.com/child  => root: /child/

If such a rule must meet, I think that only the configuration of url is enough, root can be generated during the initialization of hexo

Or is there a situation that does not meet this rule?

url: http://yoursite.com/child  ≠> root: /child/

@curbengh
Copy link
Contributor Author

curbengh commented Jul 13, 2020

This PR makes it flexible to any combination of the following configs:

url: http://yoursite.com/child
root: /child

url: http://yoursite.com/child/
root: /child

url: http://yoursite.com/child
root: /child/

url: http://yoursite.com/child/
root: /child/

url: http://yoursite.com/child/
root: child/

url: http://yoursite.com/child/
root: child

url: http://yoursite.com/child
root: child/

url: http://yoursite.com/child
root: child

@jiangtj

I think that only the configuration of url is enough, root can be generated during the initialization of hexo

You are right, it's trivial to derive root from url; we can make root config as optional and generate it if it's empty.

I wonder if there is any use case for:

url: http://yoursite.com/
root: /child/

@stevenjoezhang
Copy link
Member

I'm not sure if this PR is being too lenient on invalid config.

@curbengh
Copy link
Contributor Author

curbengh commented Jul 13, 2020

I'm not sure if this PR is being too lenient on invalid config.

Hexo currently is already pretty lenient about it.

This PR may break {{ url + path }} or {{ root + path }}, however, these approaches are discouraged and we recommend using full_url_for(path) or url_for(path) instead.

The purpose of this PR is reducing 301/302 redirects for anyone using url: http://example.com/child (possibly benefiting SEO) and encourage users to use ``url: http://example.com/child/` instead since this is the true URL most of the time.

The main goal is to make the config slightly more user-friendly.

@SukkaW
Copy link
Member

SukkaW commented Jul 13, 2020

@jiangtj @curbengh @stevenjoezhang

Please consider the Breaking Change we introduced and the backward compatibility of current themes.

If theme is designed to work with the current configuration format, then we shouldn't change the root & url in case things go wrong.

@jiangtj
Copy link
Member

jiangtj commented Jul 13, 2020

You are right, it's trivial to derive root from url; we can make root config as optional and generate it if it's empty.

I wonder if there is any use case for:

url: http://yoursite.com/
root: /child/

@curbengh If there is no special case, I prefer to remove the root directly, and then generate it

url: http://yoursite.com  => {url: http://yoursite.com, root: /}
url: http://yoursite.com/  => {url: http://yoursite.com, root: /}
url: http://yoursite.com/child  => {url: http://yoursite.com/child, root: /child/}
url: http://yoursite.com/child/  => {url: http://yoursite.com/child, root: /child/}

Like hexojs/hexo#4414, handle url and root in load_config, which can also be pretty lenient, and both full_url_for and url + path root + path are valid

Please considering about Breaking Changes and the backward compatibility of current themes then.

@SukkaW If just make full_url_for more flexible, I think this is not BC. Because the documentation and many third-party plugin themes may use url + path or root + path, so there is no change for users. . .

@curbengh
Copy link
Contributor Author

@SukkaW
If theme is designed to work with the current configuration format, then we shouldn't change the root & url in case things go wrong.
@jiangtj
If just make full_url_for more flexible, I think this is not BC.

To clarify, this PR doesn't directly cause a BC; the flexibility I mentioned earlier is more of a side effect, not a primary intention of this PR, though I think it's a benefit. It is this flexibility that could be a BC.

However, I reckon all users with root: /child/ uses url: http://yoursite.com/child as currently enforced by Hexo, so there is no immediate BC, {{ url + path }} still works.

With this PR, Hexo doesn't need (and shouldn't) enforce the syntax, so user can then use:

url: http://yoursite.com/child/
root: child

But that will break {{ url + path }}. Theme has two (easy) options:

  1. full_url_for() (recommended)
  2. User change config to url: http://yoursite.com/child

@jiangtj
If there is no special case, I prefer to remove the root directly, and then generate it
and both full_url_for and url + path root + path are valid

The intention is to keep the trailing slash url: http://yoursite.com/child/, so {{ url + path }} will not work without workaround.

@jiangtj
Copy link
Member

jiangtj commented Jul 30, 2020

I mean when the user configures url: http://yoursite.com/child/, generate a new url without the trailing slash (url: http://yoursite.com/child) and the correct root (root: /child/)

So that {{ url + path }} can also work normally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants