Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vuepress-next scalability #2689

Closed
favoyang opened this issue Oct 31, 2020 · 26 comments
Closed

Vuepress-next scalability #2689

favoyang opened this issue Oct 31, 2020 · 26 comments
Labels
type: question or discussion Question or discussion

Comments

@favoyang
Copy link
Contributor

Hi there,

I'd like to know if there's a roadmap for the next major update, Vuepress 2? The 1.0 release cycle is awesome lead by @ulivz. Now the project seems maintained by @billyyyyy3320, @bencodezen, with minimal fixes. But it's not clear who is actually leads the next major update.

Personally, I would like to see the improvements to scale vuepress for bigger projects, the ones with many thousands of pages.

Of course, people from different backgrounds may have different priorities, like support vuejs 3 for example.

@billyyyyy3320
Copy link
Collaborator

Hi @favoyang

We may still need some discussions before we make it public. FLY, @meteorlxy is working on https://github.com/vuepress/vuepress-next

@favoyang
Copy link
Contributor Author

Thanks for the link. I get the first impression that it gonna focus on Vue 3 and the typescript support.

I assume you notice these scalable issues, and perhaps @meteorlxy can see this thread.

This is basically a broad topic. Do you prefer I close this issue, or leave it open for other developers to share their wishes on the next major release?

@meteorlxy
Copy link
Member

meteorlxy commented Nov 1, 2020

I get the first impression that it gonna focus on Vue 3 and the typescript support.

In fact, it's a completely refactored new version.

As for the scalablility, we may need some real-world large projects for testing. It would be nice if you guys can provide one @favoyang @Mister-Hope @itsxallwater @xbill82 @JimmyVV

@itsxallwater
Copy link

Absolutely!

@meteorlxy
Copy link
Member

@itsxallwater If it's OK to open source your repo, you can simply paste the link here. If not, you can invite me to a private repo

@meteorlxy
Copy link
Member

meteorlxy commented Nov 1, 2020

BTW, I want to share the progress of vuepress-next:

Core features have been finished about 90%. The 10% left are some optional features that to be determined.

Now users can try to use vuepress-next with their own theme.

The remaining two big goals (help wanted @community & core team 😄):

  • Migrate the default theme (10%)
  • Documentation

@itsxallwater
Copy link

@itsxallwater If it's OK to open source your repo, you can simply paste the link here. If not, you can invite me to a private repo

It's open, by all means! Please don't hesitate to let us know if there's anything we can do to help test.

https://github.com/zumasys/docs

@favoyang
Copy link
Contributor Author

favoyang commented Nov 2, 2020

It's open, by all means! Please don't hesitate to let us know if there's anything we can do to help test.

https://github.com/zumasys/docs

This is a good candidate. It contains 2000+ static pages.

My open-source project openupm contains 1300+ pages. But most pages are generated via additionalPages, the API seems changed for vuepress-next. It could be cost more time to migrate. So for a benchmark purpose, I think @itsxallwater's project is clean and better. But I'd like to share some information.

What trouble me most is the memory usage, as it grows vuepress requres more and more memory to build. It's now required 6G. The GitHub action build bot has a 7G limitation, it's closed. I used the additionalPages feature, so I also need to verify if my generator code is not friendly for GC. I also worried that leveraging multiple cores to speed up the build #1560 may make the issue worse.

The second is that the siteData.js bundle is getter bigger. I guess vuepress needs to know all pages (path, title, heads) in advance for router and search. But is that mean that all frontmatter also needs to be packed into the siteData.js as well? I'm not entire sure about this, but if you check the $site.pages it contains all frontmatter info. If you're using the vuepress-plugin-seo plugin, it also contributes some metadata to the frontmatter.

[BABEL] Note: The code generator has deoptimised the styling of /home/favo/projects/openupm/.temp/internal/siteData.js as it exceeds the max of 500KB.

One example page with a fat frontmatter stored in $site.pages.

{
    "title": "Packages - GUI",
    "frontmatter": {
        "layout": "PackageList",
        "showFooter": false,
        "noGlobalSocialShare": true,
        "title": "Packages - GUI",
        "topics": [
        ... <it's really long anyway>
        ]
    },
    "regularPath": "/packages/topics/gui/",
    "key": "v-51883712",
    "path": "/packages/topics/gui/",
    "content": ""
}

@meteorlxy
Copy link
Member

meteorlxy commented Nov 2, 2020

What trouble me most is the memory usage, as it grows vuepress requres more and more memory to build.

When we are trying to bundle a huge web app, I'm afraid that we have to load all the files into memory.

The second is that the siteData.js bundle is getter bigger. I guess vuepress needs to know all pages (path, title, heads) in advance for router and search.

Yes, $site.pages is a problem.

It's required for the built-in search box, and may be useful for blog users who want to generate a index page to list all of their posts.

However, for a documentation site that uses algolia search box, it's useless to load all pages data.

In current vuepress-next, pagesData is extracted from siteData. But it's mainly for hot reload purpose, and we still have to load all pages data.

VitePress drops the built-in search feature, and injects page data into its own component to avoid this.

@meteorlxy
Copy link
Member

It might be a good choice to migrate to vitepress for large scale docs site. 🤔

Or, we should also drop the built-in search to get rid of the limitation

@xbill82
Copy link

xbill82 commented Nov 2, 2020

Hey @meteorlxy thanks for your work!

When we are trying to bundle a huge web app, I'm afraid that we have to load all the files into memory.

Can you develop this point, please? IMHO the memleak surges when generating the static HTML pages, which can be done (I might be naive) by reading the .md files one by one from the file-system.

As for the scalablility, we may need some real-world large projects for testing.

All our docs at @kuzzleio are open-source, but the MD files are scattered across repos. We maintain a repo for the Vuepress code and use our own CLI to gather the MD files and build them against our Vuepress code.

Would you like me to share with you the necessary steps to do it? It's three or four commands.

@meteorlxy
Copy link
Member

@xbill82

Can you develop this point, please? IMHO the memleak surges when generating the static HTML pages, which can be done (I might be naive) by reading the .md files one by one from the file-system.

Memory leakage is out of scope. We are always using more memory when the project grows up.

However, the memory usage of current Vuepress 1.x is abnormal. There might be memory leakage but it's not easy to figure it out.

The whole process (including the SSR of Vue 3) of vuepress-next is different from vuepress 1.x, which is hopefully to solve these problems. But it has not been verified, that's why I'm asking you guys to provide your repo 😉

I will start to test vuepress-next on large scale projects when the skeleton of default theme is ready.

@xbill82
Copy link

xbill82 commented Nov 2, 2020

Ok, let's put our hope in the next release.
I'll prepare you a gist with the commands to test it with our (huge) documentation.

@xbill82
Copy link

xbill82 commented Nov 2, 2020

Here it goes @meteorlxy https://gist.github.com/xbill82/dc81f7d025533014f210ef32b47e9b80
have fun and feel free to ask for help on the Kuzzle Discord, you can mention me @luca.m

@favoyang
Copy link
Contributor Author

favoyang commented Nov 2, 2020

Thanks for the link to vite and vitepress. Interesting concept but definitely costly to migrate to a unmature framework, not to mention the vuepress plugins I used...


However, the memory usage of current Vuepress 1.x is abnormal. There might be memory leakage but it's not easy to figure it out.

I agree that we should re-analyze the memory footprint, build time with vuepress-next when it's ready. It's a big refactor anyway.

@Mister-Hope has some discoveries on #2656 (comment), here I quote as below. Maybe it's helpful to figure out potential anit-GC practices. I'm totally okay that load all page data into memory, but 2048 pages with 100k memory footprint each is only cost 200MB. Vuepress 1 is requesting much more. There could be something very obviously stop the GC to do the job.

Mister-Hope: Vuepress is bad with build process. From the source code, it will generate a lot of shallow copy with frontmatter, page object(including slug, frontmatter, headings and some other info) and even siteData copy. The build process is using a newer copy of these objects while referencing some parts of the old ones, so the old ones will move to "old space" instead of being gc off.


It's required for the built-in search box, and may be useful for blog users who want to generate a index page to list all of their posts.
However, for a documentation site that uses algolia search box, it's useless to load all pages data.
In current vuepress-next, pagesData is extracted from siteData. But it's mainly for hot reload purpose, and we still have to load all pages data.

For the search feature, you may want to build indexes - just enough data for users to process a search. That means title, headlines, tags, the page ID, and URL. Packing these compact data into the main js is fine. If it's only useful for the search feature, the search plugin can collect it on it's own, not the system level. Then vuepress can do a dynamic load to get other parts of a page - front-matters and the content for an individual page request.

For generating a (paginated) index page for a blog, it happens on the build stage you can do whatever you want. E.g. get 10 most recent articles and save that into the home page front-matters. But you don't need to give them the whole website data just for letting it do the filtering on the browser.

What I'm trying to argue is that packing everything and let users download it on the first load is very hard for bigger projects to scale. Maybe it's okay for hundreds of pages but thinks about 10k pages. It's a bad not scaleable design decision.

@favoyang favoyang changed the title Vuepress 2 roadmap? Vuepress-next scalability Nov 2, 2020
@favoyang
Copy link
Contributor Author

favoyang commented Nov 2, 2020

Renamed title to "Vuepress-next scalability" to match with the actual discussion.

@Mister-Hope
Copy link
Contributor

https://github.com/Mister-Hope/Mister-Hope.github.io

My blog, happy to test it as soon as vuepress-next is release with a version.

Currently 7GB with my theme and 5GB with theme-default using vuepressV1

@Mister-Hope
Copy link
Contributor

Mister-Hope commented Nov 2, 2020

What I'm trying to argue is that packing everything and let users download it on the first load is very hard for bigger projects to scale. Maybe it's okay for hundreds of pages but thinks about 10k pages. It's a bad not scaleable design decision.

That's exactly what I wanna say, when the page grows, there will be hundreds of js link(with long hash file name) in the head tag of generated HTML.

For my blog, it's 30MB js and 70MB HTML, each html has an average size around 90KB(excluding some special ones), while 50KB of them are js link tag in head. (That is taking 30MB size for my 660 pages)The js link size is nearly the same as the actual content size.

I think that's another problem we should be careful with V2, I cannot imagine the length of the head tag with a site containing thousands of pages.

While, since the vuepress is actually working under spa, I am afraid we can not drop js files unless we use another way to do the ssr. I am expecting a group chunk of pages in same folder can be generated when dectecting large amount of pages, or users can be able to configure it.

@meteorlxy
Copy link
Member

meteorlxy commented Nov 3, 2020

@favoyang

If it's only useful for the search feature, the search plugin can collect it on it's own, not the system level.

Yes that's exactly what I'm thinking about: let search plugin and blog plugin to collect data themselves.

@Mister-Hope

there will be hundreds of js link(with long hash file name) in the head tag of generated HTML

Thanks for this point. I think what you mentioned is the prefetch links?

In fact, if you set shouldPrefetch: () => false, the links will not be rendered at all. But it's not so perfect, because the renderer will still map the files and test it with shouldPrefetch one by one (which will slow down the ssr process, too).

We might allow shouldPrefetch: false to disable prefetch links totally.

------ updated

Now vuepress-next has implemented those features:

  • load page data via dynamic import, instead of loading all of them
  • supports shouldPrefetch: false

@meteorlxy meteorlxy added the type: question or discussion Question or discussion label Nov 3, 2020
@Mister-Hope
Copy link
Contributor

Yes, the prefetch link is taking n × n space when the pages grow.

@meteorlxy
Copy link
Member

@Mister-Hope Quick response. I updated my last comment and it should work in vuepress 1.x

@jlooper
Copy link
Contributor

jlooper commented Nov 8, 2020

@meteorlxy thank you for working on VuePress 2, look forward to using it! I can test on workshops.frontendfoxes.org and other sites, I'll give it a try

@itsxallwater
Copy link

@meteorlxy I've done a preliminary build of our public docs using vuepress-next and generally speaking, it seems to work! Neato! 👍👍

This is without ejecting the theme and without bothering with any plugins, but the build and render is working.

image

Now for the bad news ☹️:

  1. On the topic of speed, I've not yet been able to get a full build to complete. Building for dev works but I had a full build running for 2+ hours and eventually stopped it. For context, our builds take ~40 minutes with the current VuePress release and ~20 minutes with a Node.js worker_threads workaround.
  2. Our authoring pattern is to name our directories after the articles and place a README.md along with any image assets that should be referenced in the article into the directory. These images are then referenced in the readme a la [image description](./image.png) which does not appear to be working in this version. Example:
ERROR in ./docs/.vuepress/.temp/pages/jbase/faq/backups-using-veeam/windows-restore/README.vue?vue&type=template&id=eefd9c2c (./node_modules/vue-loader/dist/templateLoader.js??ref--5!./node_modules/cache-loader/dist/cjs.js??ref--0-0!./node_modules/vue-loader/dist??ref--0-1!./docs/.vuepress/.temp/pages/jbase/faq/backups-using-veeam/windows-restore/README.vue?vue&type=template&id=eefd9c2c)
Module not found: Error: Can't resolve './windows_restore_9.png' in '/home/mikew/src/Internal/vuepress-next/docs/.vuepress/.temp/pages/jbase/faq/backups-using-veeam/windows-restore'

@meteorlxy
Copy link
Member

meteorlxy commented Nov 11, 2020

@itsxallwater Thanks!

  1. Seems that your build already failed, but our cli didn't terminate the process, so you thought it was stuck. It‘s a point to enhance, but has nothing to do with scalability

  2. Nice catch and will be fixed soon

Let's move to vuepress-next repo for further discussion and bug report

--- update

I've tested on your project and it only costs less than 4 minutes to build your site 😉 @itsxallwater

I've put the results to the related issue in vuepress-next repo. see vuepress/core#8

@zeeklog
Copy link

zeeklog commented Apr 8, 2024

When you own a blog who have 200k pages, you will rather die instead of keep hosting it. Expecially use Vuepress 2.x .
router.js and blogData.js is too large(more than 40MB)...

@JimmyVV
Copy link

JimmyVV commented Apr 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question or discussion Question or discussion
Projects
None yet
Development

No branches or pull requests

9 participants