Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(gatsby): enable running of static/page queries separately #12891

Closed
wants to merge 3 commits into from

Conversation

Moocar
Copy link
Contributor

@Moocar Moocar commented Mar 27, 2019

Description

As part of my work to remove the global data.json, I need to run the build-javascript step after static queries have been run, but before page queries. This is because each page result file also includes the webpack build's compilation hash, so that the running app can be invalidated if a rebuild occurs while a user is navigating the page (more info in #11982).

While I was doing this refactor, I also took the opportunity to remove a bunch of global state/event emitters. The biggest is that the query queue is now instantiable. So now we can create a queue that processes the static queries, and then create a new one that processes the page queries. Then, we can start a fresh queue processor once gatsby develop has started. Hopefully it makes that part of the code easier to reason about.

I also took a stab at making the calculation of dirty queryIds a bit easier to read. It's not great, but hopefully these changes make it a little easier to understand.

page-query-runner has become index. Apologies for the lack of a good diff, but a lot of that file has changed. I have a follow up branch that moves src/internal-plugins/query-runner/ to src/query/

Related Issues

@Moocar Moocar requested review from a team as code owners March 27, 2019 04:29
@Moocar
Copy link
Contributor Author

Moocar commented Mar 27, 2019

@KyleAMathews this fixes the

  // HACKY!!! TODO: REMOVE IN NEXT REFACTOR
  activity.start()
  emitter.emit(`START_QUERY_QUEUE`)
  await queryRunner.processQueries(
  // END HACKY

https://github.com/gatsbyjs/gatsby/pull/12891/files#diff-8c8b888e741f1c31e97d4bf05894f50dL476

}
},
store: FastMemoryStore(),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to keep those - those add extra overhead which is only needed in watch mode - but for production builds we don't really benefit from those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that was accidental. I didn't mean to delete that part. Great catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 60d1241. In my future branch, the websocket stuff is passed in by gatsby develop, so cleans up this file further.

@pieh
Copy link
Contributor

pieh commented Mar 27, 2019

Can you expand, why splitting query running is needed? Not including page queries result paths in file that gets webpacked seems like would achieve same result without touching so much code? What am I missing?

@Moocar
Copy link
Contributor Author

Moocar commented Mar 27, 2019

Each page query result will now be in its own page-data.json file. Which looks like:

{
  "componentChunkName": "component---src-pages-index-js",
  "path": "/",
  "compilationHash": "6caec38909d092c4e3d3",
  "data": {
    "allFoo": {
      ...
    }
  },
  "pageContext": {
    "isCreatedByStatefulCreatePages": true
  }
}

The compilation hash references the latest webpack compilation. So each page-data.json depends on the build-javascript stage. So they need to be run in separate phases. I'm still working on the code that ends up using this, but you can see a preview at https://github.com/Moocar/gatsby/blob/per-page-manifest/packages/gatsby/src/commands/build.js#L65.

cleanup bootstrap

moved page-query-runner to index

use finishBootstrap()

fix query-runner type annotation

refactor query-runner/index

doc sections

queryjobs refactor

make query refactor

enqueueQueryID -> enqueueExtractedQueryId
@Moocar
Copy link
Contributor Author

Moocar commented Mar 27, 2019

@pieh I'm trying to break up my huge data.json PR into multiple small ones to assist with reviewing. But recognize that it makes the context harder to understand. I think it's worth it, but let me know if you'd like the full PR instead (which isn't quite finished yet)

@KyleAMathews
Copy link
Contributor

Couldn't we just append the compilationHash to the page-data.json file? I think we'll need to add that capability anyways to support my image prefetching RFC as we won't discover which images need prefetching until we're rendering HTML.

@KyleAMathews
Copy link
Contributor

It's more disk I/O but should be relatively cheap. Tried googling how to append to a JSON file (which would be cheapest) but it doesn't look straightforward. We should be fine though since files will almost always be < 100kb.

if (!isBootstrapping) {
queue.resume()
/**
* Creates a queue that is optimized for running as a daemon during
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about the terminology — a daemon is normally a separate process right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sold on daemon as terminology either. I'm trying to communicated that it's a long running process that won't end. Other ideas? startListener, startProcess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pieh and I and some others have talked about modelling more of Gatsby's internals after the Actor Model. It's a larger discussion to conclude if we want to do that but "startActor" would be a natural name then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is actually an actor, then for sure :). But until then, we should probably stick to something generic so we don't confuse people (and yes, daemon is also a bit confusing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it believes it's an actor it is! Haha.

Service is a nice generic name :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I like service. Will change

@Moocar
Copy link
Contributor Author

Moocar commented Mar 28, 2019

Couldn't we just append the compilationHash to the page-data.json file? I think we'll need to add that capability anyways to support my image prefetching RFC as we won't discover which images need prefetching until we're rendering HTML.

Just caught up with @KyleAMathews and we chatted about this. While we could figure out a way to load/rewrite the page-data.json after the build-javascript phase, there's technically no reason we need to finish running page queries before hand. Given that, it's simpler to write the page-data.json once after build javascript has finished.

In fact, it enables interesting possibilities like running build-javascript on another process :D

@pieh pieh mentioned this pull request Mar 29, 2019
@Moocar
Copy link
Contributor Author

Moocar commented Mar 31, 2019

I've made a few more changes downstream of this branch. I'm going to close this and reopen (or recreate) once those changes are settled.

@Moocar Moocar closed this Mar 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants