Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gatsby): persist pages between runs #28590

Merged
merged 12 commits into from Apr 22, 2021
Merged

fix(gatsby): persist pages between runs #28590

merged 12 commits into from Apr 22, 2021

Conversation

pieh
Copy link
Contributor

@pieh pieh commented Dec 13, 2020

Description

Approach in #28351 turned out to be unfeasble - we do need entire page object actually to preserve backward compatiblity for DELETE_PAGE action.

To handle potential OOM issues when persisting state, I expanded state sharding during persistance from just nodes to both nodes and now persisted pages. Right now it's copied & pasted sharding done for nodes - I could have DRY it (I still can if reviewers will request it :) ), but didn't think DRYing it is worth adding mental overhead for added abstractions - thoughts?

This PR build on work already done by @redabacha (in #28316) borrowing test change (unskip) already done (in #28351).

TODO:

  • garbage collect stateful pages during bootstrap (see comment)

Related Issues

fixes #28281
fixes #26520
[ch19791]

@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Dec 13, 2020
@pieh pieh added topic: query invalidation* and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Dec 13, 2020
Comment on lines 186 to 233
const nodeShardsScenarios = [
{
numberOfNodes: 50,
simulatedNodeObjectSize: 5 * 1024 * 1024,
expectedNumberOfNodeShards: 1,
},
{
numberOfNodes: 5,
simulatedNodeObjectSize: 0.6 * 1024 * 1024 * 1024,
expectedNumberOfNodeShards: 3,
},
]
const pageShardsScenarios = [
{
numberOfPages: 50,
simulatedPageObjectSize: 10 * 1024 * 1024,
expectedNumberOfPageShards: 1,
},
{
numberOfPages: 5,
simulatedPageObjectSize: 0.9 * 1024 * 1024 * 1024,
expectedNumberOfPageShards: 5,
},
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to construct scenario matrix - and we check each combination. I didn't go over the board - primary purpose of this is that sharding for pages and nodes is separate and that one doesn't affect the other.

Something to think about is wether we want to add some edge cases here - like when number of nodes or pages is equal 0 - for now implementation for those is that we discard persisted state when either is 0 - it does make a whole lot of sense for nodes as core creates some nodes of their own even if user doesn't use any source plugins. It's a bit different for pages - gatsby doesn't create pages (well except for gatsby develop creating dev-404-page ...). I am not sure if it's worth handling this case more gracefully? Do we want to optimize for scenarios where there are 0 pages?


const pages: Array<[string, IGatsbyPage]> = [].concat(...pagesChunks)

if (!pagesChunks.length) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially problematic - this will discard state if there are 0 pages - do we want to optimize for those cases? (I don't even know if gatsby actually "works" if there are 0 pages TBH)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When way to go about this is to add pagesCount to redux.rest part and compare it I guess? Cache will get invalidated anyway after upgrade due to pagesChunk.length being 0 so I think cache shape "version" could be handled gracefully

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there were no pages then it's probably no big deal to bust the cache anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per off-thread discussion. I think you can merge this PR in a patch without this check, if you wanted to and then add the check in a minor bump later.

@pieh pieh marked this pull request as ready for review December 14, 2020 09:04
packages/gatsby/src/redux/persist.ts Show resolved Hide resolved

const pages: Array<[string, IGatsbyPage]> = [].concat(...pagesChunks)

if (!pagesChunks.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per off-thread discussion. I think you can merge this PR in a patch without this check, if you wanted to and then add the check in a minor bump later.

@pieh
Copy link
Contributor Author

pieh commented Dec 20, 2020

Additional note on this - this needs additional handling done for createPagesStatefully: createPages does "garbage collect" and it works as-is with this code, but createPagesStatefully doesn't have garbage collection so we will never clean those pages up. This means that if user do gatsby build -> remove page from src/pages/ -> gatsby build, with this code as-is we will crash with (currently):

 ERROR #85913  GRAPHQL

There was a problem reading the file: /Users/misiek/dev/gatsby-starter-blog/src/pages/multiple-exports.js

File: src/pages/multiple-exports.js



  Error: ENOENT: no such file or directory, open '/Users/misiek/dev/gatsby-starter-blog/src/pages/multiple-exports.js'

failed extract queries from components - 0.324s

TODO item: garbage collect stateful pages in bootstrap

@pieh
Copy link
Contributor Author

pieh commented Feb 10, 2021

If merged, we should revert #29431

@pieh pieh changed the title fix(gatsby): invalidate queries if page context changes between runs fix(gatsby): persist pages between runs Apr 13, 2021
@pieh pieh marked this pull request as ready for review April 14, 2021 11:33
LekoArts
LekoArts previously approved these changes Apr 21, 2021
Copy link
Contributor

@LekoArts LekoArts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -74,7 +89,7 @@ function guessSafeChunkSize(values: Array<[string, IGatsbyNode]>): number {
}

// Sends a warning once if any of the chunkSizes exceeds approx 500kb limit
if (maxSize > 500000) {
if (showMaxSizeWarning && maxSize > 500000) {
report.warn(
`The size of at least one page context chunk exceeded 500kb, which could lead to degraded performance. Consider putting less data in the page context.`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've talked about adding information about what page exceeds this limit, but that's for a follow-up (not in this PR)

* add stateful page to artifacts tests

* fix(gatsby): garbage collect stateful pages

* Revert "fix: clear tracked queries when deleting stale page-data files (#29431)" (#30848)

This reverts commit 478cf68.
Copy link
Contributor

@LekoArts LekoArts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌮

@pieh pieh merged commit a0b31bc into master Apr 22, 2021
@pieh pieh deleted the fix/no-rebuild-bug branch April 22, 2021 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants