Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glob scope path only if configured with a pattern #6692

Merged
merged 6 commits into from Jan 19, 2018

Conversation

ashmaroli
Copy link
Member

@ashmaroli ashmaroli commented Jan 11, 2018

Resolves #6691

Take the less expensive route unless the user configures the scope["path"] with a glob pattern. e.g. include

  • "content/*.pdf"
  • "content/**/*"

@ashmaroli ashmaroli requested a review from a team January 11, 2018 06:52
@ashmaroli ashmaroli added the fix label Jan 11, 2018
@ashmaroli ashmaroli added this to the v3.7.1 milestone Jan 11, 2018
Copy link
Member

@Crunch09 Crunch09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👏 LGTM! Might be worth adding a test to check that Dir.glob is no longer called for pattern-less paths?
Also can you confirm that this speeds up https://github.com/mmistakes/front-matter-defaults.git again?

@DirtyF
Copy link
Member

DirtyF commented Jan 11, 2018

On latest macOS with my macbook air and @mmistakes test repo:

3.6.2 3.7.0 Fix
16.279s 25.321s 15.913s
18.276s 30.145s 15.979s
17.406s 29.49s 16.09s

@ashmaroli
Copy link
Member Author

Might be worth adding a test to check that

@Crunch09 That's a good idea to be extra sure.. Will see if I can come up with a simple check..

On latest macOS with my macbook air

@DirtyF Thank you very much for running the tests 💯

@mmistakes
Copy link

Tested the fix on the front-matter-defaults repo and my large personal site... both are back to building at 3.6.2 speeds.

😄

@@ -123,6 +128,13 @@ def path_is_subpath?(path, parent_path)
false
end

def glob_pattern?(path)
path.each_filename do |str|
return true if str =~ %r!\*+|\?|\[.*\]|{.*}!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does path.include?(“*”) not work for this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to test the path component string for specific glob patterns. and regex seems better suited for it than multiple String#include? calls

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashmaroli Why do we need to test each filename separately? AFAIK, the presence of * in scope["path"] is enough to know it's a glob pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parkr

defaults:
  - scope:
      path: "_puppies/arrears.?"
      type: "puppies"
    values:
      layout: arrears

is also a glob pattern to catch all kinds of files with basename arrears, if you ask me..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen or used ? before. Is that a Windows-ism?

Let's decide which glob patterns we should support and write individual tests using them so we know they work and are supported. The original PR only mentioned * which is why I mention that one only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, for your example, I would have used "_puppets/arrears.*".

scope_path = Pathname.new(scope_path).relative_path_from site_path
return true if path_is_subpath?(sanitized_path, scope_path)

if glob_pattern?(rel_scope_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of this, we do if scope["path"].to_s.include?("*"). Or you can change the implementation of glob_pattern? to just do path.to_s.include?("*"). What do you think?

Copy link
Member

@parkr parkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple unit test for glob_pattern? would be great if you don't mind. Using my suggestion:

should "match globs" do
  assert @site.frontmatter_defaults.glob_include?("content/*.pdf")
end

should "not match non-globs" do
  refute @site.frontmatter_defaults.glob_include?("layouts/")
end

@ashmaroli
Copy link
Member Author

if you don't mind

@parkr I don't mind adding tests at all.. but just that I find it difficult to digest your suggestion above..
The absence of * in the scope["path"] doesn't mean it's not a glob pattern..

/cc @pathawks for your inputs.. (Thanks in advance..)

Regarding the unit test,
I won't be able to assert @site.frontmatter_defaults.glob_include?("content/*.pdf") because glob_include? is a private method.. ( yeah, I can use :send.. )

@DirtyF
Copy link
Member

DirtyF commented Jan 14, 2018

Those use case seems already fine for most users, we never spoke of supporting all possible regexp patterns in path, did we?

"content/*.pdf"
"content/**/*"

Let's try to keep it simple here for now, and ship a quick fix.

@ashmaroli ashmaroli mentioned this pull request Jan 15, 2018
7 tasks
scope_path = Pathname.new(scope_path).relative_path_from site_path
Jekyll.logger.debug "Globbed Scope Path:", scope_path
return true if path_is_subpath?(sanitized_path, scope_path)
end
end

path_is_subpath?(sanitized_path, rel_scope_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this an else for the previous block (if scope["path"].to_s.include?("*"))?
Multiple exits can get confusing.

Copy link
Member Author

@ashmaroli ashmaroli Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't make this an else branch because if diff-ln#113 evaluates to false even at the end of looping the globbed array, diff-ln#117 acts as a fallback..

That said, it does not make sense to me why that fallback's needed..

Copy link
Member

@parkr parkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

<div class="note warning">
<h5>Globbing and Performance</h5>
<p>
Please note that globbing a path is a known performance-poison and is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The poison sounds really terrifying! Can we rephrase?

Please note that front matter defaults with glob paths have known performance problems.
Windows users may notice especially high build times by adding globbing. We are tracking [in this issue](let's open an issue).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are tracking [in this issue](let's open an issue).

I believe this is an issue with Ruby for Windows itself as Ruby is not optimized for Windows.. There's not much one can do here IMO..

So.. I'll just re-phrase the "poison" bit to something more mellow..

Copy link
Member

@pathawks pathawks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏼

@DirtyF
Copy link
Member

DirtyF commented Jan 19, 2018

@jekyllbot: merge +bug

@jekyllbot jekyllbot merged commit 009d308 into jekyll:master Jan 19, 2018
jekyllbot added a commit that referenced this pull request Jan 19, 2018
@ashmaroli
Copy link
Member Author

@mmistakes If possible can you please provide the following feedback on this change using your front-matter-defaults repo.. and Jekyll's master branch..?

  1. Build times on Windows and Unix with jekyll build
  2. Build times on Windows and Unix with jekyll build --watch followed by a minor change to about.md
  3. Build times on Windows and Unix with jekyll serve followed by a minor change to about.md

Thanks in advance..

@mmistakes
Copy link

mmistakes commented Jan 23, 2018

Sure thing @ashmaroli, here's what I'm getting on Windows.

3.6.2 3.7.0 jekyll/jekyll master
1. 20.098s 48.487s 20.729s
2. 20.721s 49.879s 21.192s
3. 20.977s 50.313s 21.518s

@ashmaroli
Copy link
Member Author

here's what I'm getting

Right.. and the regeneration times..?

@mmistakes
Copy link

Those are the regeneration times. It wasn't any faster regenerating the page after about.md was modified. Still took ~20s... or in the case of 3.7.0 around 50.

@ashmaroli
Copy link
Member Author

Okay buddy.. Thanks a lot..
On my system (Windows 7, Ruby 2.3, jekyll:master and minima:master)
the build times were:

  • initial-build: 45.412 seconds
  • regeneratn.: 254.069825 seconds

Something's wrong somewhere!!

@mmistakes
Copy link

mmistakes commented Jan 23, 2018

Forgot to mention, I added the wdm gem to my Gemfile as instructed by this warning:

gem 'wdm', '>= 0.1.0' if Gem.win_platform?

Without it the regen speeds were much slower for some reason.

Without wdm:

  • initial-build: 20.914 seconds
  • regeneration: 37.999 seconds

With wdm:

  • initial-build: 20.624 seconds
  • regeneration: 21.493 seconds

@ashmaroli
Copy link
Member Author

MIND=BLOWN !!
Two years of Jekyll on Windows and I finally realize the truth behind the suggestion to include gem 'wdm'..

Without wdm:

  • initial-build: 47.503 seconds
  • regeneratn.: 246.402432 seconds

With wdm:

  • initial-build: 45.334 seconds
  • regeneratn.: 48.469285 seconds

💯 👏 @mmistakes

@mmistakes
Copy link

@ashmaroli 😃

I've never really known what adding wdm did. I feel like at one point --watch stopped working for me reliably so I added it, but didn't know it helped performance until now.

@ashmaroli ashmaroli deleted the glob-opt-in branch January 24, 2018 14:38
@jekyll jekyll locked and limited conversation to collaborators Jul 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Front Matter defaults performance regression
7 participants