Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build time exploding with many pages #8066

Closed
klieret opened this issue Mar 21, 2020 · 16 comments
Closed

Build time exploding with many pages #8066

klieret opened this issue Mar 21, 2020 · 16 comments

Comments

@klieret
Copy link

klieret commented Mar 21, 2020

My Environment

Software Version(s)
Operating System Ubuntu 19.10
jekyll 4.0.0
github-pages n/a

Expected Behaviour

The build time with n identical pages should scale linearly (or in other words, the build time of a single page should stay constant).

Current Behavior

The build time per page (!) grows to more than a second per page (!):

image

Code Sample

Speed testes as follows:

jekyll new speed_test_jekyll  
cd speed_test_jekyll
mkdir pages
cd pages
# create 150 pages (simply copy the example post 150 times over)
for i in {1..150}; do /bin/cp ../_posts/*-welcome-to-jekyll.markdown $i.md; done
cd ..
bundle exec jekyll build

Further investigations

  • Using build --verbose one can see that the time is spent per page, rather than e.g. at the beginning or end
  • Removing inclusions from the templates doesn't change anything
@ashmaroli
Copy link
Member

@klieret Thank you for opening an interesting ticket.
From the test steps, it looks like you're using the minima theme..? If so can you run the tests again and generate the graph by removing jekyll-seo-tag from the mix..?

Follow these steps to do so:

  • Create a new directory at source labeled _includes.
  • Copy _includes/head.html from the minima gem on your system to the source directory.
  • Remove {% seo %} from <source_dir>/_includes/head.html

@klieret
Copy link
Author

klieret commented Mar 21, 2020

Thank you for your incredibly fast reply!

Stripping down head.html didn't change anything

I think the culprit is the navigation in header.html. Removing everything there makes everything run instantaneously again.

@ashmaroli
Copy link
Member

Stripping down head.html didn't change anything

Thank you for testing.
This is interesting. Hope you cleared the .jekyll-cache directory between runs.

I think the culprit is the navigation in header.html.

Probably. There is a Liquid {% for ... %} loop and a couple of Liquid map filters piped to site.pages..
But I'm not able to wrap my head around how increasing the no.of documents inside _posts directory would affect site.pages...
Documents under _posts are inside site.posts while site.pages remain constant (in your test site) with just index.markdown, about.markdown and the css artefacts..

@ashmaroli
Copy link
Member

ashmaroli commented Mar 21, 2020

Ah..! What your test script does is to copy the 2020-03-21-welcome-to-jekyll.markdown from _posts directory over to pages directory 150 times..
That explains everything..
To make matters worse, there's also a Liquid where filter piped to site.pages.....

If anything, this ticket can serve as a call for the need to optimize the include in the minima theme..
Thanks.

@klieret
Copy link
Author

klieret commented Mar 21, 2020

Ah, perhaps I should have stressed that: As in the code snippet above, I was considering a case with many pages (the markdown files are created in the pages directory), so indeed the number of links will increase (if not excluded)

@klieret
Copy link
Author

klieret commented Mar 21, 2020

Exactly!

@klieret
Copy link
Author

klieret commented Mar 21, 2020

I'll open a report in the minima theme then!

Thanks a lot for helping me figure this out!

@ashmaroli
Copy link
Member

@klieret If you'd like to experiment, we've a pull request here on jekyll/jekyll that'll speed things up on your test-site out-of-the-box: #7992
Once you generate the test-site, edit your Gemfile to point to the pull-request branch:

gem 'jekyll', github: 'jekyll/jekyll', ref: 'refs/pull/7992/head'

Now running bundle exec jekyll build should be much faster. It is slated for release as part of Jekyll 4.1..
(You can use the master branch similarly to see the difference between v4.0 and the future.. 😉)

@klieret
Copy link
Author

klieret commented Mar 22, 2020

Indeed, wow! This makes a huge difference, reducing the time from 75s to just 6.5s (150 pages)! Thanks for pointing this out, can't wait till the 4.1 release :)

@ashmaroli
Copy link
Member

@klieret 75s for 150 pages is a lot. Even on Windows, the build times were just 30s for 150 pages (with Jekyll 4.0 released gem)

I'm curious. Will you be able to plot a graph for the test-site builds with the above PR branch..? Thanks.

(You can easily avoid the disk-cache when using that branch by passing --disable-disk-cache to the build command.)

@klieret
Copy link
Author

klieret commented Mar 22, 2020

Hmm, perhaps my system was busy yesterday, I can't quite reproduce the high numbers.

But I just ran the normal v4.00 version and the #7992 branch a couple of times and get the following:

image

So the behaviour stays the same (and is probably due to the scaling of the where clause etc, I assume)

@ashmaroli
Copy link
Member

What's 1.8*new supposed to be..?

@klieret
Copy link
Author

klieret commented Mar 22, 2020

Oh I just noticed that the curves are identical but for a factor of ~1.8, so 1.8*new is the new values multiplied by 1.8 (can be ignored)

@ashmaroli
Copy link
Member

Ah! Lastly, may I know how you generate the graphs..?
Thanks for all the input either ways.

@klieret
Copy link
Author

klieret commented Mar 22, 2020

More boring than you'd think (unfortunately), I just jotted down the numbers

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()
n_files = [10, 50, 100, 150, 200]
ax.plot(
    n_files,
    [0.323, 1.5, 4.416, 10.92, 19.184],
    color="black",
    label="v4.0.0"
)
ax.plot(
    n_files,
    [0.452, 0.704, 2.098, 5.442, 11.483],
    color="red",
    label="new"
)
ax.plot(
    n_files,
    1.8*np.array([0.452, 0.704, 2.098, 5.442, 11.483]),
    color="yellow",
    label="1.8*new"
)
ax.set_xlabel("Number of posts")
ax.set_ylabel("Seconds to build")
ax.legend()

But it would be probably easy to write a short shell script to automatize the number taking to some csv file and then load this I assume.

@ashmaroli
Copy link
Member

😃 When it comes to crunching data, Python's got you covered.. hehe..
Once again, thank you for opening this ticket and sharing your insights.

@jekyll jekyll locked and limited conversation to collaborators Mar 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants