Fix serving files that clash with directories (#6222) #6231

MattSturgeon · 2017-07-18T23:50:34Z

Fix #6222.

To avoid having to re-implement the entire set_filename method, I instead extended search_index_file and temporarily mutate res.filename. Kinda hacky, so I tried to leave plenty of inline comments. (Also first time using ruby, so beware!)

For reference, here's the base class search_index_file implementation.

Feel free to checkout the commit message too.

pathawks · 2017-07-18T23:56:16Z

/cc: @jekyll/stability

pathawks · 2017-07-18T23:57:31Z

Thank you @MattSturgeon for learning Ruby just to fix this! 🍻

MattSturgeon · 2017-07-19T00:05:28Z

No problem, feel free to recommend any changes if required.

Just noticed the first mistake, I linked to the wrong super method in the commit message! Should be this one instead, so I'll amend the commit and we can wait for the CI builds to start from scratch!

ashmaroli

Hi, some minor changes:

== and === check for equality
= assigns a value to a variable

if File.extname == ".foo"
  puts "Foo Ahoy!"
  baz = "foobar"
end

ashmaroli · 2017-07-19T02:09:21Z

lib/jekyll/commands/serve/servlet.rb

+          # First, let's see if the default implementation can figure it out.
+          # (Check for index files in the res.filename directory)
+          if file = super
+            return file


arguments for if, unless, while etc check for equality via (== or ===) and the body assigns a value to a variable via =

I wasn't checking for equality. I was checking if the assigned value was truthy. if file = super evaluates to nul we continue, otherwise we return the value assigned to file.

(the super method)

Appreciate you mentioning it though, since I did say I was new to ruby. (Not as new to programming though).

I see.. that wasn't readily apparent to me from reading the code.
Learnt something new today then.. 🍻

ashmaroli · 2017-07-19T04:09:16Z

lib/jekyll/commands/serve/servlet.rb

+            path_arr = res.filename.scan(%r!/[^/]*!)
+            while basename = path_arr.pop
+              break unless basename == "/"
+            end


I feel this while..break unless -- can be replaced by an alternative..

It can probably just be basename = path_arr.pop, but to cover edge cases I wanted to loop until basename wasn't empty (well, "/") since there may be (a?) trailing slash.

okay, can you list 3-5 such edge-cases of path and basename? Will help you write tests later..

The normal code flow would be

basename = arr.pop if basename == "/" then basename = arr.pop # I can't imagine this happening # more than once, but imho it # makes more sense to allow for # the possibility. done

To me, while ... break unless ... is actually quite an elegant solution to popping off an unknown number of elements and assigning the final pop.

Sorry, didn't see your comment before replying. 3-5? Not sure. Edge case may not be the best choice of words.

Basically our code doesn't know if res.filename will have a trailing / or not. We could go with whether it does or doesn't right now, but it may change in the future or (without analysing the WEBrick code) be inconsistent.

Essentially I'm treating it as unsanitised input.

Even if there are no trailing slashes (in the current WEBrick implementation), having the loop immediately break is effectively the same as not having the loop, in terms of performance and complexity. It just gives us the robustness against the possibility.

have you tried until?

basename == arr.pop until basename == "/"

hostname == arr.pop until hostname == "/" doesn't assign the value of arr.pop to hostname. Also hostname will be undefined when evaluating the until condition.

Looking at the set_filename implementation, (which sets up the res.filename and calls search_index_file), res.filename is fairly sane and unlikely to change.

It looks like the filename shouldn't have any trailing slashes (so basename = arr.pop without testing for trailing slashes may work) when we get in search_index_file, but I haven't tested.

Still, the while method isn't complex or slow and it makes our method more robust and resilient to change.

I could use index instead of arrays, something like this:

# Index of final / that isn't followed by another / or EOL # Could just be rindex('/') if we don't care about trailing slashes i = res.filename.rindex %r!/(?!/|$)! basename = res.filename[i..-1] res.filename = res.filename[0, i]

MattSturgeon · 2017-07-19T04:10:10Z

I've tried to make the code more readable, following @ashmaroli's feedback.

In 20a333f I replaced a conditional return with an unless branch and removed two out of three cases of assignment within conditions. I also re-write most of the comments.

It's a fixup commit, so intended to be rebased/autosquashed into the original commit, just thought I'd keep it separate for now in case anyone wanted to compare it with the original PR (?w=1 is your friend here).

ashmaroli · 2017-07-19T06:32:59Z

FYI, Currently, Jekyll's merge bot will squash all commits irrespective of the --fixup flag

ghost

very nice! +1 for the extra documentation

ghost · 2017-07-19T06:36:09Z

lib/jekyll/commands/serve/servlet.rb

+            res.filename << basename unless file
+          end
+
+          return file


you technically don't need the return statement, since ruby returns the last statement in a function anyways

But the last statement is res.filename << basename, unless file is evaluated first, right? Either way I think it is more readable in this case to explicitly return file.

MattSturgeon · 2017-07-19T06:46:08Z

Ok, good to know. I think this is probably ready now, I've pushed up a slightly re-worded commit with all changes including switching to index based splitting:

i = res.filename.rindex "\/"
basename = res.filename[i..-1]
res.filename = res.filename[0, i]

And simply appending basename back onto filename instead of copying a backup.

From what I could tell, trailing slashes are unlikely to ever be an issue with WEBrick so I shifted to simply splitting on the index of the final / char without messing around with arrays or loops.

ashmaroli · 2017-07-19T06:49:41Z

What we need now are tests.. 😈

MattSturgeon · 2017-07-19T06:50:34Z

Where are the existing tests for the serve command?

ashmaroli · 2017-07-19T06:52:27Z

https://github.com/jekyll/jekyll/blob/master/test/test_commands_serve.rb

MattSturgeon · 2017-07-19T06:52:59Z

How did I miss that? I did look, I promise!

MattSturgeon · 2017-07-19T07:04:55Z

I'm probably missing something, but I'm not sure there's any testable changes here. Short of running the server, and making a mini client within the test suit to assert that HTTP requests return expected results.

The tests for command serve seem to be mostly sanity checking configuration options to make sure config files and command line switches have taken affect.

I'd need to run the command asynchronously, then fire off a HTTP request to a few URLs that I know are either directories, files or both and assert that the result matches the local file I wanted.

Is this within scope of the tests? It sounds more complex than the entire feature to me. It doesn't seem to be something that was added for #3452

What kind of things should I be testing for this change?

ashmaroli

needs some more edits..

ashmaroli · 2017-07-19T07:04:04Z

lib/jekyll/commands/serve/servlet.rb

@@ -21,10 +21,36 @@ def initialize(server, root, callbacks)
        # up with a different preference on which comes first.

        def search_file(req, res, basename)
-          # /file.* > /file/index.html > /file.html
+          # /file.* -> /file.html


please undo this change.

Any reason why? It was an intended change, since search_file doesn't change "/file" into "/file/index.html" (that's search_index_file's job)

I don't think the comment meant changing /file.* into /file.index.html and then into /file.html either.
Its probably documenting the order of traversal.

Ah, in that case I'll clarify and move it above search_file since it applies to both.

How's this?

# Order of precedence: # /file -> /file/index.html -> /file.html -> fancy-indexing -> 404.html

I'll defer this to the original author for his input.
/cc @envygeeks

For now I have this:

# Add the ability to tap file.html the same way that Nginx does on our # Docker images (or on GitHub Pages.) The difference is that we might end # up with a different preference on which comes first. # # Order of precedence: # /file -> /file/index.html -> /file.html -> fancy-indexing -> 404.html def search_file # First see if we can find /file normally, if not try /file.html end def search_index_file # First, let's see if our super method can find an index file. unless # Ok, looks like there's no index file in the directory. # Let's look for directory.html instead... # Try and find a file named dirname.html in the parent directory. end end

actual code removed for brevity

/cc @envygeeks

ashmaroli · 2017-07-19T07:04:57Z

lib/jekyll/commands/serve/servlet.rb

+        def search_index_file(req, res)
+          # /file/index.html -> /file.html
+
+          # First, let's see if the our super method can figure it out.


small typo => ..if the our..

ashmaroli · 2017-07-19T07:07:54Z

lib/jekyll/commands/serve/servlet.rb

+            res.filename = res.filename[0, i]
+
+            # Try and find a file named dirname.html in the parent directory.
+            file = search_file(req, res, basename + ".html")


in the really rare chance that basename is not a String, this will fail. Instead interpolate:

- file = search_file(req, res, basename + ".html") + file = search_file(req, res, "#{basename}.html")

ashmaroli · 2017-07-19T07:12:49Z

What kind of things should I be testing for this change?

Mainly, if the original issue you reported has been fixed and stays fixed..

MattSturgeon · 2017-07-19T07:16:02Z

What kind of things should I be testing for this change?

Mainly, if the original issue you reported has been fixed and stays fixed..

IMO that's not practical to test. Unless you have some test suite that can compare files served over HTTP to files on the filesystem?

envygeeks · 2017-07-20T01:22:06Z

This is a lot of complicated code for such a simple problem.

MattSturgeon · 2017-07-20T01:39:16Z

Well the problem isn't as simple as it sounds, but I wouldn't say the solution was complex or even large (10 lines).

set_filename passes us a directory path (res.filename) and it expects us to return a file from within that directory to use as an index file (usually index.html).

We want to abuse this slightly and return a file from the parent directory of the one we have, so we have to modify res.filename and look for files in the modified directory path.

If we fail, we undo our modification so that we don't break WEBrick's Fancy Indexing.

Are you saying that it isn't worth this amount of complexity? Also, are you happy with the modification to your /file.* > /file/index.html > /file.html comment?

envygeeks · 2017-07-20T02:50:00Z

No, I'm not happy with the comment change, because it's arbitrary. I'm happy you learned Ruby and welcome to the Ruby community! This can be fixed by changing search file to super(req, res, ".html") || super || super(req, res, "#{basename}.html") without the extra 9 lines of code.

Sorry it took me so long to respond.

Fix jekyll#6222, related to jekyll#3452 See base class search_file implementation: https://github.com/ruby/webrick/blob/3fb6a011/lib/webrick/httpservlet/filehandler.rb#L365-L385

MattSturgeon · 2017-07-20T03:29:26Z

Ah, much simpler. Probably should be ordered super || super(req, res, ".html") || super(req, res, "#{basename}.html") to prioritise the default.

As soon as I noticed search_index_file was used when searching a directory, I focused on that and didn't notice that you could just use ".html" in search_file and it would concatenate into a sane filename.

I've added your changes as 1dc2bc7.

envygeeks · 2017-07-20T04:15:19Z

No, it should be in the order I put it in, as that was my intention, and it was my intention with this source to begin with. You should also rebase, squash, and only have commits in your name, it is bad form to commit in another's name, it should instead be credited in the extended message (if you wish,) as committing in my name forces me to have to say "that commit is not mine, it's not signed" in the future, should a problem arise, and it also alters the blame in a confusing way on Github. Sorry to be weird.

Thanks!

envygeeks · 2017-07-20T04:24:47Z

I guess what I am saying is, I do wish to take blame if something goes wrong (and I'll happily help fix it,) but the commit should not be in my name, you should instead opt to link to my comment in the extended commit, so that people know that the source was originally my idea, and it's my fault if something is wrong.

MattSturgeon · 2017-07-20T05:30:24Z

I didn't commit in your name, I listed you as the author of the changes. That's what --author is for, committing other's changes. It still list's me as the committer, so I still get the blame for any issues with the commit.

commit 1dc2bc74580060d4f00dd4a45e02a70574a56672
tree 4207e3ea2b7b52733fb6d3e3a85d5cc00d6b6522
parent 56546a28fda95098c44b31090050db0f415be574
author Jordon Bedwell <jordon@********> 1500420691 +0100
committer Matt Sturgeon <matt@********> 1500521221 +0100

    Fix serving files that clash with directories

Why did you want it to check ".html" before the default check of basename? Surely basename -> ".html" -> "basename.html" or basename -> "basename.html" -> ".html" makes more sense?

Either way since it is your change now, it is probably easiest if you make a separate PR.

parkr · 2017-07-21T19:00:37Z

Thank you, all! I'm guessing the comment directly above this line is fine?

parkr · 2017-07-21T19:34:53Z

Additionally, a test would be nice.

pathawks · 2017-07-21T19:39:04Z

Additionally, a test would be nice.

Suggestions would be appreciated. See above comment:

I'm probably missing something, but I'm not sure there's any testable changes here. Short of running the server, and making a mini client within the test suit to assert that HTTP requests return expected results.
...
What kind of things should I be testing for this change?

envygeeks · 2017-07-21T19:41:09Z

This source was never really tested to begin with, I don't know if I would force that constraint on the new author, I'll take responsibility outside of this pull request if necessary, it shouldn't really fall on him.

parkr · 2017-07-25T22:22:40Z

Ok! I'd love to get that module tested so we don't break this behaviour in the future, but it's not a super high priority. Thanks, all!

@jekyllbot: merge +minor

envygeeks · 2017-07-26T20:19:33Z

This commit is not what my code originally was, what it intended, and it still has my name stamped all over it as the author even though the person who used my name falsely refused to fix the problem, it is merged now and my name is associated with something I didn't write, please fix this problem.

MattSturgeon · 2017-07-26T20:45:46Z

@envygeeks your name is not on the merged commit (ec84bec)

pathawks requested a review from a team July 18, 2017 23:56

jekyllbot assigned ghost Jul 18, 2017

MattSturgeon force-pushed the fix-6222 branch from 16897d3 to 811ed0f Compare July 19, 2017 00:06

ashmaroli requested changes Jul 19, 2017

View reviewed changes

ashmaroli reviewed Jul 19, 2017

View reviewed changes

MattSturgeon force-pushed the fix-6222 branch from bb94deb to 737461c Compare July 19, 2017 06:39

ghost approved these changes Jul 19, 2017

View reviewed changes

ashmaroli requested changes Jul 19, 2017

View reviewed changes

MattSturgeon force-pushed the fix-6222 branch from 6107d9f to 6dfc2cc Compare July 20, 2017 03:26

Fix serving files that clash with directories

1dc2bc7

Fix jekyll#6222, related to jekyll#3452 See base class search_file implementation: https://github.com/ruby/webrick/blob/3fb6a011/lib/webrick/httpservlet/filehandler.rb#L365-L385

MattSturgeon force-pushed the fix-6222 branch from 6dfc2cc to 1dc2bc7 Compare July 20, 2017 03:28

parkr approved these changes Jul 21, 2017

View reviewed changes

jekyllbot merged commit ec84bec into jekyll:master Jul 25, 2017

jekyllbot added the enhancement label Jul 25, 2017

jekyllbot added a commit that referenced this pull request Jul 25, 2017

Update history to reflect merge of #6231 [ci skip]

897cdbb

MattSturgeon deleted the fix-6222 branch July 26, 2017 20:40

parkr mentioned this pull request Jul 29, 2017

Fix 1dc2bc74580060d4f00dd4a45e02a70574a56672 #6246

Closed

This was referenced Oct 21, 2017

jekyll s redirects to slightly wrong URL (with file and folder of same name) #6459

Closed

If bar.html exists, then jekyll s will show bar.html for bar/.* unless something else explicitly (but not implicitly) overrides it #6475

Closed

ashmaroli mentioned this pull request Nov 20, 2017

return correct file in dir if dir has same name as file #6569

Merged

jekyll locked and limited conversation to collaborators Jan 24, 2020

jekyllbot added the frozen-due-to-age label Jan 24, 2020

Fix serving files that clash with directories (#6222) #6231

Fix serving files that clash with directories (#6222) #6231

Conversation

MattSturgeon commented Jul 18, 2017 • edited

Fix #6222.

pathawks commented Jul 18, 2017

pathawks commented Jul 18, 2017

MattSturgeon commented Jul 19, 2017

ashmaroli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattSturgeon Jul 19, 2017 • edited

Choose a reason for hiding this comment

MattSturgeon Jul 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattSturgeon Jul 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattSturgeon commented Jul 19, 2017

ashmaroli commented Jul 19, 2017

ghost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattSturgeon Jul 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattSturgeon commented Jul 19, 2017

ashmaroli commented Jul 19, 2017 • edited

MattSturgeon commented Jul 19, 2017

ashmaroli commented Jul 19, 2017

MattSturgeon commented Jul 19, 2017

MattSturgeon commented Jul 19, 2017

ashmaroli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashmaroli commented Jul 19, 2017

MattSturgeon commented Jul 19, 2017

envygeeks commented Jul 20, 2017

MattSturgeon commented Jul 20, 2017

envygeeks commented Jul 20, 2017

MattSturgeon commented Jul 20, 2017 • edited

envygeeks commented Jul 20, 2017

envygeeks commented Jul 20, 2017

MattSturgeon commented Jul 20, 2017 • edited

parkr commented Jul 21, 2017

parkr commented Jul 21, 2017

pathawks commented Jul 21, 2017

envygeeks commented Jul 21, 2017

parkr commented Jul 25, 2017

envygeeks commented Jul 26, 2017

MattSturgeon commented Jul 26, 2017

MattSturgeon commented Jul 18, 2017 •

edited

MattSturgeon Jul 19, 2017 •

edited

MattSturgeon Jul 19, 2017 •

edited

MattSturgeon Jul 19, 2017 •

edited

MattSturgeon Jul 19, 2017 •

edited

ashmaroli commented Jul 19, 2017 •

edited

MattSturgeon commented Jul 20, 2017 •

edited

MattSturgeon commented Jul 20, 2017 •

edited