Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WEBrick directory listings take priority over permalinks #6222

Closed
5 of 18 tasks
MattSturgeon opened this issue Jul 15, 2017 · 13 comments · Fixed by #6231
Closed
5 of 18 tasks

WEBrick directory listings take priority over permalinks #6222

MattSturgeon opened this issue Jul 15, 2017 · 13 comments · Fixed by #6231
Labels
frozen-due-to-age has-pull-request Somebody suggested a solution to fix this issue

Comments

@MattSturgeon
Copy link
Contributor

MattSturgeon commented Jul 15, 2017

Prefixing the default post permalink and setting page permalink to the same value doesn't work.

Seems to only affect the local jekyll serve WEBrick server. GitHub Pages, for example, isn't affected.


  • I believe this to be a bug, not a question about using Jekyll.
  • I updated to the latest Jekyll (or) if on GitHub Pages to the latest github-pages
  • I ran jekyll doctor to check my configuration
  • I read the CONTRIBUTION file at https://jekyllrb.com/docs/contributing/
  • This is a feature request.

  • I am on (or have tested on) macOS 10+
  • I am on (or have tested on) Debian/Ubuntu GNU/Linux
  • I am on (or have tested on) Fedora GNU/Linux
  • I am on (or have tested on) Arch GNU/Linux
  • I am on (or have tested on) Other GNU/Linux
  • I am on (or have tested on) Windows 10+

  • I was trying to install.
  • There is a broken Plugin API.
  • I had an error on GitHub Pages, and I have reproduced it locally.
  • I had an error on GitHub Pages, and GitHub Support said it was a Jekyll Bug.
  • I had an error on GitHub Pages and I did not test it locally.
  • I was trying to build.
  • It was another bug.

My Reproduction Steps

  • in _config.yml, set permalink to /blog/:title
  • create a page and set its permalink to /blog
  • run jekyll serve
  • visit localhost:4000/blog
    • Note that you are seeing the WEBrick server's directory index, not your page
  • visit localhost:4000/blog.html
    • Note that the page does exist
  • in _config.yml, set permalink to /notblog/:title
  • rerun jekyll serve
  • visit localhost:4000/blog
    • Note that it works correctly now

The Output I Wanted

I expected that visiting the permalink defined in the page's front matter would display that page.

Related to #3452

@pathawks
Copy link
Member

So, Webrick doesn't redirect /blog to /blog.html?

@MattSturgeon
Copy link
Contributor Author

MattSturgeon commented Jul 15, 2017

That's right. If you remove/modify the /blog/ prefix from the /blog/:title permalink, it works fine. I'm guessing WEBrick only redirects blog to blog.html if a blog directory doesn't actually exist.

@MattSturgeon
Copy link
Contributor Author

Disabling directory indexing by setting :FancyIndexing to false for WEBrick::HTTPServlet::FileHandler (somewhere in here?) may help fix this.

While I know gh-pages probably uses a different http server (maybe nginx or apache), I couldn't reproduce this bug when I tested on github.io.

Using python's http.server module, which shows directory indexes too, I reproduced this.

@MattSturgeon MattSturgeon changed the title permalink cannot be the same as part of another WEBrick directory listings take priority over permalinks Jul 15, 2017
@pathawks
Copy link
Member

I wonder if there is some way we could configure one to take priority over the other

@MattSturgeon
Copy link
Contributor Author

MattSturgeon commented Jul 18, 2017

Ok, looking into WEBrick::HTTPServlet::FileHandler a little more, and learning some ruby as I go. The directory indexing stuff only happens after the normal GET request handling has failed, so disabling :FancyIndexing won't fix this bug anyway.

It may be possible to extend search_index_files(req, res) to fallback to filename + '.html' after its default check of filename + '/', but this may mess with the path_info magic that set_filename(req, res) is doing, so we may end up having to override that too.

Thinking of something along these lines:

# untested code. TODO: setup jekyll dev environment!
def search_index_file(req, res)

  # First try the default check for index files in the directory
  if file = super
      return file
  end
  
  # Otherwise, try a custom check for directory.html in the parent directory
  
  # Extract basename from the end of res.filename
  path_arr = res.filename.scan(%r{/[^/]*})

  # Get the last non-empty section of the filename
  while basename = path_arr.pop
      break unless basename == '/'
  end

  if file = search_file(req, res, basename + '.html')
      res.filename = path_arr.join
      return file
  end
  
  return nil

end

MattSturgeon added a commit to MattSturgeon/jekyll that referenced this issue Jul 18, 2017
Jekyll extends WEBrick to serve file.html if file is requested but
doesn't exist (see jekyll#3452). This doesn't work if file exists as a
directory though.

This commit extends WEBrick further so that if file exists as a
directory, we first try and serve file/index.html (as normal), but
fallback to file.html if no index can be found within the file
directory (reported as jekyll#6222).

It is slightly hacky, to avoid having to re-implement the entire
set_filename method, we instead extend search_index_file and temporarily
mutate res.filename. This is all clearly commented.

See base class search_index_file implementation:

https://github.com/ruby/webrick/blob/3fb6a011d620f1c19fce71bc3447740553e4fa80/lib/webrick/httpservlet/filehandler.rb#L365-L385
@jekyllbot jekyllbot added the has-pull-request Somebody suggested a solution to fix this issue label Jul 18, 2017
MattSturgeon added a commit to MattSturgeon/jekyll that referenced this issue Jul 19, 2017
Jekyll extends WEBrick to serve file.html if file is requested but
doesn't exist (see jekyll#3452). This doesn't work if file exists as a
directory though.

This commit extends WEBrick further so that if file exists as a
directory, we first try and serve file/index.html (as normal), but
fallback to file.html if no index can be found within the file
directory (reported as jekyll#6222).

It is slightly hacky, to avoid having to re-implement the entire
set_filename method, we instead extend search_index_file and temporarily
mutate res.filename. This is all clearly commented.

See base class search_index_file implementation:

https://github.com/ruby/webrick/blob/3fb6a011/lib/webrick/httpservlet/filehandler.rb#L356-L363
MattSturgeon added a commit to MattSturgeon/jekyll that referenced this issue Jul 19, 2017
Jekyll extends WEBrick to serve file.html if file is requested but
doesn't exist (see jekyll#3452). This doesn't work if file exists as a
directory though.

This commit extends WEBrick further so that if file exists as a
directory, we first try and serve file/index.html (as normal), but
fallback to file.html if no index can be found within the file
directory (reported as jekyll#6222).

It is slightly hacky, to avoid having to re-implement the entire
set_filename method, we instead extend search_index_file and mutate
res.filename, reverting the change if it isn't needed.

See base class search_index_file implementation:

https://github.com/ruby/webrick/blob/3fb6a011/lib/webrick/httpservlet/filehandler.rb#L356-L363
MattSturgeon pushed a commit to MattSturgeon/jekyll that referenced this issue Jul 20, 2017
MattSturgeon pushed a commit to MattSturgeon/jekyll that referenced this issue Jul 20, 2017
@janpio
Copy link
Contributor

janpio commented Oct 21, 2017

Awesome that this was fixed in 3.6.x, but I think there still might something slightly broken maybe:

(I hope it is ok just to reply to this issue, parts of the PR was so salty that I am a bit scared...)

I have a Jekyll project with permalink: /:path that has a page.md and a page/second-page.md. This should be served as /page and /page/second-page.

In 3.5.2 going to /page showed the directory listing. Now in 3.6.0 this is fixed thanks to this PR (if I understood correctly). Problem is while the page works, /page is still redirected to /page/ (which is different to expected behaviour and also different to how it works on Github pages).

Here is a screenshot of Firefox devtools of the redirection:
image

Did I find the correct issue connected to this?
What would be an appropriate next step?

@pathawks
Copy link
Member

@janpio So /page should not show a directory listing, and should not redirect to /page/, but should display /page.html, is that correct?

That seems like a regression, if this changed from 3.5.x to 3.6.0.

Could you open a new issue explaining all of this? This deserves it's own conversation.

@janpio
Copy link
Contributor

janpio commented Oct 21, 2017

Almost.

/page should show the content.
In 3.5.x it showed the directly listing after redirecting to /page/.
Since 3.6.0 it shows the correct content but still redirects to /page/. Content good, redirect still bad.

But just now I noticed that /page/second-page does actually render the content that should be at /page, which is of course really bad.

I will create a new issue with repro etc.

@pathawks
Copy link
Member

What is on the disk, in the _site directory?

If the content is at /page/index.html then it seems like a redirect from /page to /page/ is acceptable. But yes, do open an issue so we can give this the attention it needs.

@janpio
Copy link
Contributor

janpio commented Oct 21, 2017

No, content is in page.html. (Only page/index.md should be rendered to /page/index.html, right?)

See this new issue for complete description and reproduction repo:
#6459

@MattSturgeon
Copy link
Contributor Author

MattSturgeon commented Oct 21, 2017 via email

@janpio
Copy link
Contributor

janpio commented Oct 21, 2017

That is one part, yes.

Unfortunately there is another, pages inside the folder show the same content as the page clashing with the folder name. This only started with 3.6.x. See #6459 for a proper bug report with repro etc.

@janpio
Copy link
Contributor

janpio commented Oct 23, 2017

To make it less confusing I extracted the both problem (unnecessary redirect, wrong content rendered) into to seperate issues: #6459 + new #6475

Mr0grog added a commit to edgi-govdata-archiving/100days that referenced this issue Feb 6, 2019
We upgraded Jekyll to 2.6.3 (a minor version upgrade from 2.4.3) to fix a security vulnerability, but the entire 2.6.x line has a bug with how URLs are handled in the built-in web server. (Fixed here: jekyll/jekyll#6459 and jekyll/jekyll#6475, originally introduced in the fix to jekyll/jekyll#6222 (comment)).

Anyway, 2.8.5 includes a fix and generates an identical `_site` directory on disk, so should be all good.
@jekyll jekyll locked and limited conversation to collaborators Jul 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age has-pull-request Somebody suggested a solution to fix this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants