Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-parsed queries and/or Query caching #227

Open
matthewmueller opened this issue Jun 18, 2018 · 8 comments
Open

Pre-parsed queries and/or Query caching #227

matthewmueller opened this issue Jun 18, 2018 · 8 comments

Comments

@matthewmueller
Copy link
Contributor

matthewmueller commented Jun 18, 2018

Right now when you execute against the schema, it parses the queries each time. We could speed up these executions by either parsing some queries beforehand or caching previously seen queries.

The use-case I have in mind is when you'd like to respond to a webhook or expose a REST API that use your GraphQL resolvers internally, having pre-parsed server-side queries would speed up the operation.

Something like this:

func (s *Schema) ExecDocument(
  ctx context.Context,
  query *graphql.Document, 
  operationName string, 
  variables map[string]interface{},
) *Response {
    // ...
}

An alternative to this is lazily caching the parsed query results, when we encounter the same query twice, use the cached result. This would probably need to be bounded by an LRU cache, but I think it'd be a nice addition to schema.Exec(...)

@matthewmueller matthewmueller changed the title Pre-parsed queries and Query caching Pre-parsed queries and/or Query caching Jun 18, 2018
@pavelnikolov
Copy link
Member

pavelnikolov commented Jun 20, 2018

If you are using GraphQL over HTTP (which I guess is the case for most of the users of this library), then the ideal solution would be to just use old-school Cache-Control headers.
The guys from Apollo have done a great job on caching. They have developed a cache control extension. I do believe that Cache-Control header is the right approach to do it (i.e. CDN/infrastructure agnostic) and extensions is the way to go. This library would definitely benefit from extensions. Unfortunately, extensions are not supported yet. 😞
To workaround this, in my team we have implemented imperative caching instead of the much better declarative approach. To do this we had to do two things:

  1. Allow GET requests (I'll send a PR for that very soon)
  2. Allow each resolver to declare its caching preferences. (maxAge - TTL in seconds and scope - PUBLIC or PRIVATE). Then at the end of each request we set the appropriate CacheControl header for its response. This is not an ideal solution, but a good workaround until this library supports extensions.

Regardless of the caching implementation (imperative/declarative), there is another thing that (usually) goes hand-in-hand with caching - persisted queries. With persisted queries it would e nice to validate queries only once and then every time a persisted query is executed skip the validation step (unless the code has changed).

@matthewmueller
Copy link
Contributor Author

matthewmueller commented Jun 20, 2018

Hi @pavelnikolov, thanks for the background on apollo's caching system! It's interesting to read what they did to address the stale data problem.

This issue is not exactly about stale data. It's about caching the static queries, so if we see a query come in, we already have the parsed query document ready to go.

In code, something like this:

cache := map[string]*query.Document{}

op := `
  query {
    hero {
      name
      appearsIn
    }
  }
`

cache[op] = query.Parse(op)

So each time we see those static queries again, we have this cache ready with the parsed query.

@pavelnikolov
Copy link
Member

My point is that with cache hints/imperative caching you'll not have to see the same query again. If you set the cache-control headers, the response will be cached by your infrastructure (Nginx, Varnish, a CDN etc) or in the user's browser.

What you are suggesting would definitely benefit persisted queries, though. With persisted queries, the client application will not have to send the entire query, instead it can send a pre-defined query_id. Also, since the request doesn't need the body, it can be send over GET method. For example:

GET https://your-domain.com/graphql?query_id=5&variales={"key":"value"}

Then the handler can load the query from some cache store (e.g. Redis, Memcached or even embedded datastore). In this case we don't need to validate/parse the query again - we can do it only once when the query is inserted into the datastore.
This is one of the things I would like to work on.

It is possible to use this idea, even without persisted queries or caching. The application would probably store the query+valriables+operation in a key/value store by hash. This way each unique combination of query+variables+operationName will e parsed/validated at most once during the lifetime of the GraphQL application. Is this what you mean?

@matthewmueller
Copy link
Contributor Author

matthewmueller commented Jun 20, 2018

@pavelnikolov Ahh yep, I see your point now. I guess that's less of an issue with this library and more about how you setup your graphql infrastructure.

I think you're right that cache-control would probably be a better way to architect your app, but I think this is low-hanging fruit to improve the performance of graphql servers out of the box.

This is off-topic, but I've also run into the use case for persisted queries, though in my opinion the other literature online doesn't really do a good job of selling them. From what I've read, their main benefit is to avoid sending full queries each time. I think an even more important role for them is interacting with existing services. Using your example, rather than:

GET https://your-domain.com/graphql?query_id=5&variables={"key":"value"}

You can expose:

GET https://your-domain.com/user?variables={"key":"value"}

Which would call a predefined server-side "user" query. That's what I was getting at with wanting the ability to use pre-parsed queries. Then you can migrate your existing REST API to using GraphQL under the hood without changing your existing clients.

@pavelnikolov
Copy link
Member

I guess that's less of an issue with this library and more about how you setup your graphql infrastructure

Exactly!

But do you really need this optimisation? Is this actually a bottleneck for your API? I haven't seen the metrics/benchmarks, to be honest, but I think that the time spent there would be relatively insignificant. Maybe there are apps that need this optimisations, though. Just not my team's use-case. I'll think a little bit more about it.

@matthewmueller
Copy link
Contributor Author

matthewmueller commented Jun 20, 2018

But do you really need this optimisation?

Maybe not, my use case is more about supporting static server-side queries. Seems wasteful to have to parse these kinds of queries since I have access to the schema and query right on the server anyway. Then it got me thinking about lazily caching since that would also solve the problem.

For example, exposing this would work great:

users := query.MustParse(`...`)
response := schema.ExecQuery(users, variables)

@pavelnikolov
Copy link
Member

pavelnikolov commented Jun 20, 2018 via email

@matthewmueller
Copy link
Contributor Author

Totally understand! And thanks for the discussion earlier – I learned a lot from this issue :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants