New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix serving files that clash with directories (#6222) #6231
Conversation
/cc: @jekyll/stability |
Thank you @MattSturgeon for learning Ruby just to fix this! 🍻 |
No problem, feel free to recommend any changes if required. Just noticed the first mistake, I linked to the wrong super method in the commit message! Should be this one instead, so I'll amend the commit and we can wait for the CI builds to start from scratch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, some minor changes:
==
and===
check for equality=
assigns a value to a variable
if File.extname == ".foo"
puts "Foo Ahoy!"
baz = "foobar"
end
lib/jekyll/commands/serve/servlet.rb
Outdated
# First, let's see if the default implementation can figure it out. | ||
# (Check for index files in the res.filename directory) | ||
if file = super | ||
return file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arguments for if
, unless
, while
etc check for equality via (==
or ===
) and the body assigns
a value to a variable via =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't checking for equality. I was checking if the assigned value was truthy. if file = super
evaluates to nul
we continue, otherwise we return the value assigned to file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate you mentioning it though, since I did say I was new to ruby. (Not as new to programming though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.. that wasn't readily apparent to me from reading the code.
Learnt something new today then.. 🍻
lib/jekyll/commands/serve/servlet.rb
Outdated
path_arr = res.filename.scan(%r!/[^/]*!) | ||
while basename = path_arr.pop | ||
break unless basename == "/" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this while..break unless --
can be replaced by an alternative..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can probably just be basename = path_arr.pop
, but to cover edge cases I wanted to loop until basename wasn't empty (well, "/") since there may be (a?) trailing slash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, can you list 3-5 such edge-cases of path
and basename
? Will help you write tests later..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normal code flow would be
basename = arr.pop
if basename == "/" then basename = arr.pop # I can't imagine this happening
# more than once, but imho it
# makes more sense to allow for
# the possibility.
done
To me, while ... break unless ...
is actually quite an elegant solution to popping off an unknown number of elements and assigning the final pop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't see your comment before replying. 3-5? Not sure. Edge case may not be the best choice of words.
Basically our code doesn't know if res.filename
will have a trailing /
or not. We could go with whether it does or doesn't right now, but it may change in the future or (without analysing the WEBrick code) be inconsistent.
Essentially I'm treating it as unsanitised input.
Even if there are no trailing slashes (in the current WEBrick implementation), having the loop immediately break is effectively the same as not having the loop, in terms of performance and complexity. It just gives us the robustness against the possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tried until
?
basename == arr.pop until basename == "/"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hostname == arr.pop until hostname == "/"
doesn't assign the value of arr.pop
to hostname
. Also hostname
will be undefined when evaluating the until
condition.
Looking at the set_filename
implementation, (which sets up the res.filename
and calls search_index_file
), res.filename
is fairly sane and unlikely to change.
It looks like the filename shouldn't have any trailing slashes (so basename = arr.pop
without testing for trailing slashes may work) when we get in search_index_file
, but I haven't tested.
Still, the while
method isn't complex or slow and it makes our method more robust and resilient to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use index instead of arrays, something like this:
# Index of final / that isn't followed by another / or EOL
# Could just be rindex('/') if we don't care about trailing slashes
i = res.filename.rindex %r!/(?!/|$)!
basename = res.filename[i..-1]
res.filename = res.filename[0, i]
I've tried to make the code more readable, following @ashmaroli's feedback. In 20a333f I replaced a conditional return with an It's a fixup commit, so intended to be rebased/autosquashed into the original commit, just thought I'd keep it separate for now in case anyone wanted to compare it with the original PR ( |
FYI, Currently, Jekyll's merge bot will squash all commits irrespective of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice! +1 for the extra documentation
lib/jekyll/commands/serve/servlet.rb
Outdated
res.filename << basename unless file | ||
end | ||
|
||
return file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you technically don't need the return
statement, since ruby returns the last statement in a function anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the last statement is res.filename << basename
, unless file
is evaluated first, right? Either way I think it is more readable in this case to explicitly return file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree!
Ok, good to know. I think this is probably ready now, I've pushed up a slightly re-worded commit with all changes including switching to index based splitting: i = res.filename.rindex "\/"
basename = res.filename[i..-1]
res.filename = res.filename[0, i] And simply appending From what I could tell, trailing slashes are unlikely to ever be an issue with WEBrick so I shifted to simply splitting on the index of the final |
What we need now are tests.. 😈 |
Where are the existing tests for the |
How did I miss that? I did look, I promise! |
I'm probably missing something, but I'm not sure there's any testable changes here. Short of running the server, and making a mini client within the test suit to assert that HTTP requests return expected results. The tests for command serve seem to be mostly sanity checking configuration options to make sure config files and command line switches have taken affect. I'd need to run the command asynchronously, then fire off a HTTP request to a few URLs that I know are either directories, files or both and assert that the result matches the local file I wanted. Is this within scope of the tests? It sounds more complex than the entire feature to me. It doesn't seem to be something that was added for #3452 What kind of things should I be testing for this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs some more edits..
lib/jekyll/commands/serve/servlet.rb
Outdated
@@ -21,10 +21,36 @@ def initialize(server, root, callbacks) | |||
# up with a different preference on which comes first. | |||
|
|||
def search_file(req, res, basename) | |||
# /file.* > /file/index.html > /file.html | |||
# /file.* -> /file.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please undo this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why? It was an intended change, since search_file
doesn't change "/file" into "/file/index.html" (that's search_index_file
's job)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the comment meant changing /file.*
into /file.index.html
and then into /file.html
either.
Its probably documenting the order of traversal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, in that case I'll clarify and move it above search_file
since it applies to both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's this?
# Order of precedence:
# /file -> /file/index.html -> /file.html -> fancy-indexing -> 404.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer this to the original author for his input.
/cc @envygeeks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I have this:
# Add the ability to tap file.html the same way that Nginx does on our
# Docker images (or on GitHub Pages.) The difference is that we might end
# up with a different preference on which comes first.
#
# Order of precedence:
# /file -> /file/index.html -> /file.html -> fancy-indexing -> 404.html
def search_file
# First see if we can find /file normally, if not try /file.html
end
def search_index_file
# First, let's see if our super method can find an index file.
unless
# Ok, looks like there's no index file in the directory.
# Let's look for directory.html instead...
# Try and find a file named dirname.html in the parent directory.
end
end
actual code removed for brevity
/cc @envygeeks
lib/jekyll/commands/serve/servlet.rb
Outdated
def search_index_file(req, res) | ||
# /file/index.html -> /file.html | ||
|
||
# First, let's see if the our super method can figure it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo => ..if the our..
lib/jekyll/commands/serve/servlet.rb
Outdated
res.filename = res.filename[0, i] | ||
|
||
# Try and find a file named dirname.html in the parent directory. | ||
file = search_file(req, res, basename + ".html") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the really rare chance that basename
is not a String
, this will fail. Instead interpolate:
- file = search_file(req, res, basename + ".html")
+ file = search_file(req, res, "#{basename}.html")
Mainly, if the original issue you reported has been fixed and stays fixed.. |
IMO that's not practical to test. Unless you have some test suite that can compare files served over HTTP to files on the filesystem? |
This is a lot of complicated code for such a simple problem. |
Well the problem isn't as simple as it sounds, but I wouldn't say the solution was complex or even large (10 lines).
We want to abuse this slightly and return a file from the parent directory of the one we have, so we have to modify If we fail, we undo our modification so that we don't break WEBrick's Fancy Indexing. Are you saying that it isn't worth this amount of complexity? Also, are you happy with the modification to your |
No, I'm not happy with the comment change, because it's arbitrary. I'm happy you learned Ruby and welcome to the Ruby community! This can be fixed by changing search file to Sorry it took me so long to respond. |
Fix jekyll#6222, related to jekyll#3452 See base class search_file implementation: https://github.com/ruby/webrick/blob/3fb6a011/lib/webrick/httpservlet/filehandler.rb#L365-L385
Ah, much simpler. Probably should be ordered As soon as I noticed I've added your changes as 1dc2bc7. |
No, it should be in the order I put it in, as that was my intention, and it was my intention with this source to begin with. You should also rebase, squash, and only have commits in your name, it is bad form to commit in another's name, it should instead be credited in the extended message (if you wish,) as committing in my name forces me to have to say "that commit is not mine, it's not signed" in the future, should a problem arise, and it also alters the blame in a confusing way on Github. Sorry to be weird. Thanks! |
I guess what I am saying is, I do wish to take blame if something goes wrong (and I'll happily help fix it,) but the commit should not be in my name, you should instead opt to link to my comment in the extended commit, so that people know that the source was originally my idea, and it's my fault if something is wrong. |
I didn't commit in your name, I listed you as the author of the changes. That's what
Why did you want it to check ".html" before the default check of Either way since it is your change now, it is probably easiest if you make a separate PR. |
Thank you, all! I'm guessing the comment directly above this line is fine? |
Additionally, a test would be nice. |
Suggestions would be appreciated. See above comment:
|
This source was never really tested to begin with, I don't know if I would force that constraint on the new author, I'll take responsibility outside of this pull request if necessary, it shouldn't really fall on him. |
Ok! I'd love to get that module tested so we don't break this behaviour in the future, but it's not a super high priority. Thanks, all! @jekyllbot: merge +minor |
This commit is not what my code originally was, what it intended, and it still has my name stamped all over it as the author even though the person who used my name falsely refused to fix the problem, it is merged now and my name is associated with something I didn't write, please fix this problem. |
@envygeeks your name is not on the merged commit (ec84bec) |
Fix #6222.
To avoid having to re-implement the entire set_filename method, I instead extended
search_index_file
and temporarily mutateres.filename
. Kinda hacky, so I tried to leave plenty of inline comments. (Also first time using ruby, so beware!)For reference, here's the base class
search_index_file
implementation.Feel free to checkout the commit message too.