Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.8.0 #2380

Merged
merged 48 commits into from
Jan 11, 2023
Merged

Release 1.8.0 #2380

merged 48 commits into from
Jan 11, 2023

Conversation

abernix
Copy link
Member

@abernix abernix commented Jan 11, 2023

Note

Commits to review: https://github.com/apollographql/router/pull/2380/files/13bb00be39e1e06e8444151dd75393e1eaf282ea..HEAD

[1.8.0] - 2023-01-11

📃 Configuration

Configuration changes will be automatically migrated on load. However, you should update your source configuration files as these will become breaking changes in a future major release.

Defer support GA docs and config (Issue #2368)

We're pleased to announce that @defer support has been promoted to general availability in accordance with our product launch stages.

Defer is enabled by default in the Router, however if you had previously explicitly disabled defer support via configuration then you will need to update your configuration accordingly:

Before:

supergraph:
  preview_defer_support: true

After:

supergraph:
  defer_support: true

By @bryncooke in #2378

Remove timeout from OTLP exporter (Issue #2337)

A duplicative timeout property has been removed from the telemetry.tracing.otlp object since the batch_processor configuration already contained a timeout property. The Router will tolerate both options for now and this will be a breaking change in a future major release. Please update your configuration accordingly to reduce future work.

Before:

telemetry:
  tracing:
    otlp:
      timeout: 5s

After:

telemetry:
  tracing:
    otlp:
      batch_processor:
        timeout: 5s

By @bryncooke in #2338

🚀 Features

Emit hit/miss metrics for APQ, Query Planning and Introspection caches (Issue #1985)

Added metrics for caching.
Each cache metric contains a kind attribute to indicate the kind of cache (query planner, apq, introspection)
and a storage attribute to indicate the backing storage e.g memory/disk.

The following buckets are exposed:
apollo_router_cache_hit_count - cache hits.

apollo_router_cache_miss_count - cache misses.

apollo_router_cache_hit_time - cache hit duration.

apollo_router_cache_miss_time - cache miss duration.

Example

# TYPE apollo_router_cache_hit_count counter
apollo_router_cache_hit_count{kind="query planner",new_test="my_version",service_name="apollo-router",storage="memory"} 2
# TYPE apollo_router_cache_hit_time histogram
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.001"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.005"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.015"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.05"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.1"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.2"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.3"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.4"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.5"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="1"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="5"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="10"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="+Inf"} 2
apollo_router_cache_hit_time_sum{kind="query planner",service_name="apollo-router",storage="memory"} 0.000236782
apollo_router_cache_hit_time_count{kind="query planner",service_name="apollo-router",storage="memory"} 2
# HELP apollo_router_cache_miss_count apollo_router_cache_miss_count
# TYPE apollo_router_cache_miss_count counter
apollo_router_cache_miss_count{kind="query planner",service_name="apollo-router",storage="memory"} 1
# HELP apollo_router_cache_miss_time apollo_router_cache_miss_time
# TYPE apollo_router_cache_miss_time histogram
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.001"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.005"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.015"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.05"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.1"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.2"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.3"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.4"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.5"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="1"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="5"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="10"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="+Inf"} 1
apollo_router_cache_miss_time_sum{kind="query planner",service_name="apollo-router",storage="memory"} 0.000186783
apollo_router_cache_miss_time_count{kind="query planner",service_name="apollo-router",storage="memory"} 1

By @bnjjj in #2327

Add support for single instance Redis (Issue #2300)

Experimental caching via Redis now works with single Redis instances when configured with a single URL.

By @bnjjj in #2310

Support TLS connections to single instance Redis (Issue #2332)

TLS connections are now supported when connecting to single Redis instances. It is useful for connecting to hosted Redis providers where TLS is mandatory.
TLS connections for clusters are not supported yet, see Issue #2332 for updates.

By @Geal in #2336

🐛 Fixes

Correctly handle aliased __typename fields (Issue #2330)

If you aliased a __typename like in this example query:

{
  myproducts: products {
       total
       __typename
  }
  _0___typename: __typename
}

Before this fix, _0___typename was set to null. Thanks to this fix it now properly returns Query.

By @bnjjj in #2357

subgraph_request span is now set as the parent of traces coming from subgraphs (Issue #2344)

Before this fix, the context injected in headers to subgraphs was wrong and not attached to the correct parent span id, causing it to appear disconnected when rendering the trace tree.

By @bnjjj in #2345

🛠 Maintenance

Simplify telemetry config code (Issue #2337)

This brings the telemetry plugin configuration closer to standards recommended in the YAML design guidance.

By @bryncooke in #2338

Upgrade the clap version in scaffold templates (Issue #2165)

Upgrade clap dependency version to a version supporting the generation of scaffolded plugins via xtask.

By @bnjjj in #2343

Upgrade axum to 0.6.1 (PR #2303)

For more details about the new axum release, please read the project's change log

By @bnjjj in #2303

Set the HTTP response content-type as application/json when returning GraphQL errors (Issue #2320)

When throwing a INVALID_GRAPHQL_REQUEST error, it now specifies the expected content-type header rather than omitting the header as it was previously.

By @bnjjj in #2321

Move APQ and EnsureQueryPresence layers to the new router_service (PR #2296)

Moving APQ from the axum level to the supergraph_service reintroduced a Buffer to the service pipeline.
To avoid this, now the APQ and EnsureQueryPresence layers are part of the newly introduced router_service, removing that Buffer.

By @Geal in #2296

Refactor YAML validation error reports (Issue #2180)

YAML configuration file validation prints a report of the errors it encountered, but that report was missing some details and occasionally had its diagnostics cursor pointing at the wrong character/line. It now points at the correct place more reliably.

By @Geal in #2347

abernix and others added 30 commits December 23, 2022 18:22
This merges `main` back into `dev` after the #2312 release.
close #2300

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
…id graphql request error (#2321)

close #2320

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
Co-authored-by: Jesse Rosenberger <git@jro.cc>
Co-authored-by: Geoffroy Couprie <geoffroy@apollographql.com>
This follows-up #2202 and it consists of several commits which can
stand alone, if necessary. Each of those commits has their own message
and while I suggest reviewing the totality of the PR, it's worth
considering the text of the individual commit messages for additional
context on the changes.

As a summary of those commits:

- Remove destructive `git reset --hard` command which destroyed my local
changes
- Require a pristine Git checkout of known files prior to releasing
- Update Helm Chart version BEFORE `helm-docs` and `helm template`
commands.
- Do a pre-flight check which asserts availabilty of necessary tools
- Remove pre-determined version heading from `NEXT_CHANGELOG.md`
- Repair logic which migrates `NEXT_CHANGELOG.md` entries to
`CHANGELOG.md`
- Support ANY version string rather than just
digits-dot-digits-dot-digits.
- Remove quotes around invocation of `helm template`'s `--set` flags

Contributes to #2261

Co-authored-by: Coenen Benjamin <benjamin.coenen@hotmail.com>
fix failing test_updated CI build
Upgrade to axum `0.6.1`

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
Co-authored-by: Jesse Rosenberger <git@jro.cc>
This will facilitate some behind-the-scenes automation that executes in
our planning repository and can help applying metadata to pull requests
and issues automatically.
When the router service was created, the APQ functionality that was
performed at the axum level, before calling the supergraph service, was
moved to a layer above the supergraph service. Since it requires
AsyncCheckPoint (to asynchronously call a database to get the query), it
required the inner service to be cloned, so we had to reintroduce a
Buffer layer.
There was also a semantic issue here: we assume that a supergraph
request contains a valid graphql request, but instead, when the router
service creates it, it lets layers in the supergraph service deal with
APQ and checking that the query is present.

This refactors the APQ and EnsureQueryPresence layers, to move them in
the router service's code, after the supergraph request is created, but
before it is passed to the supergraph service. This reduces the code
drastically (a few branches instead of two entire layers). The tests
from EnsureQueryPresence are now moved to the router service.
Related to #2165

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
close #2344

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
## ❗ BREAKING ❗

### Remove timeout from otlp exporter ([Issue
#2337](#2337))

`batch_processor` configuration contains timeout, so the existing
timeout property has been removed from the parent configuration element.

Before:
```yaml
telemetry:
  tracing:
    otlp:
      timeout: 5s
```
After:
```yaml
telemetry:
  tracing:
    otlp:
      batch_processor:
        timeout: 5s
```

## 🛠 Maintenance

### Simplify telemetry config code ([Issue
#2337](#2337))

This brings the telemetry plugin configuration closer to standards
recommended in the [yaml design
guidance](dev-docs/yaml-design-guidance.md).

Co-authored-by: bryn <bryn@apollographql.com>
Update the PR template
…ampler (#2356)

close #2339

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
When starting the router with this configuration file, that contains
errors due to invalid data and fields that are not recognized:

```yaml
supergraph:
    listen: 127.0.0.1:4000
    introspection: "a"
    query_planning:
      experimental_cache:
        redis:
          urls: ["rediss://:router@127.0.0.1:4101"]
sandbox:
  enabled: true

tls:
  subgraphs:
    certificate_authorities: "${file./home/geal/dev/test/tls-proxy/server.crt}"

include_subgraph_errors:
  all: true
```

The router will print this:

```
2023-01-05T10:57:42.878484Z ERROR configuration had errors: 
1. /supergraph/introspection

supergraph:
    listen: 127.0.0.1:4000
    introspection: "a"
                   ^----- "a" is not of type "boolean"

2. /supergraph/query_planning/experimental_cache

supergraph:
    listen: 127.0.0.1:4000
    introspection: "a"
    query_planning:
      experimental_cache:
┌         redis:
|           urls: ["rediss://:router@127.0.0.1:4101"]
└-----> "in_memory" is a required property

3. /supergraph/query_planning/experimental_cache

supergraph:
    listen: 127.0.0.1:4000
    introspection: "a"
    query_planning:
      experimental_cache:
┌         redis:
|           urls: ["rediss://:router@127.0.0.1:4101"]
└-----> Additional properties are not allowed ('redis' was unexpected)
2023-01-05T10:57:42.879601Z ERROR no valid configuration was supplied
```

There are multiple issues with that log:
- `in_memory` is a required property of `experimental_cache` but the
arrow starts at `redis`
- when the arrow is printed, it shifts elements of that line by 2
spaces, but not the other lines
- the top level `tls` field is invalid (it comes from a PR that is not
merged yet), but we do not see any error pointing it out
- each error indicates the path in the JSON file, but it is not very
actionable information
- if there are multiple unknown properties under a map, we only show one
error (not displayed here)

Here's the version from this PR:

```
2023-01-05T11:02:37.071102Z ERROR configuration had errors: 
1. at line 3

  supergraph:
      listen: 127.0.0.1:4000
      introspection: "a"
                     ^----- "a" is not of type "boolean"

2. at line 5

  supergraph:
      listen: 127.0.0.1:4000
      introspection: "a"
      query_planning:
┌       experimental_cache:
|         redis:
|           urls: ["rediss://:router@127.0.0.1:4101"]
└-----> "in_memory" is a required property

3. at line 6

      listen: 127.0.0.1:4000
      introspection: "a"
      query_planning:
        experimental_cache:
┌         redis:
|           urls: ["rediss://:router@127.0.0.1:4101"]
└-----> Additional properties are not allowed ('redis' was unexpected)

4. at line 11

            urls: ["rediss://:router@127.0.0.1:4101"]
  sandbox:
    enabled: true
  
┌ tls:
|   subgraphs:
|     certificate_authorities: "${file./home/geal/dev/test/tls-proxy/server.crt}"
└-----> Additional properties are not allowed ('tls' was unexpected)


2023-01-05T11:02:37.071775Z ERROR no valid configuration was supplied
```

All errors are now displayed, all the file lines have proper
indentation, aligned the same way for all error types, the arrows start
from the right line and we show the line number
### Add cache hit/miss metrics ([Issue
#1985](#1985))

Add several metrics around the cache.
Each cache metrics it contains `kind` attribute to know what kind of
cache it was (`query planner`, `apq`, `introspection`)
and the `storage` attribute to know where the cache is coming from.

`apollo_router_cache_hit_count` to know when it hits the cache.

`apollo_router_cache_miss_count` to know when it misses the cache.

`apollo_router_cache_hit_time` to know how much time it takes when it
hits the cache.

`apollo_router_cache_miss_time` to know how much time it takes when it
misses the cache.

Example
```
# TYPE apollo_router_cache_hit_count counter
apollo_router_cache_hit_count{kind="query planner",new_test="my_version",service_name="apollo-router",storage="memory"} 2
# TYPE apollo_router_cache_hit_time histogram
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.001"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.005"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.015"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.05"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.1"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.2"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.3"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.4"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.5"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="1"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="5"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="10"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="+Inf"} 2
apollo_router_cache_hit_time_sum{kind="query planner",service_name="apollo-router",storage="memory"} 0.000236782
apollo_router_cache_hit_time_count{kind="query planner",service_name="apollo-router",storage="memory"} 2
# HELP apollo_router_cache_miss_count apollo_router_cache_miss_count
# TYPE apollo_router_cache_miss_count counter
apollo_router_cache_miss_count{kind="query planner",service_name="apollo-router",storage="memory"} 1
# HELP apollo_router_cache_miss_time apollo_router_cache_miss_time
# TYPE apollo_router_cache_miss_time histogram
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.001"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.005"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.015"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.05"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.1"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.2"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.3"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.4"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.5"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="1"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="5"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="10"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="+Inf"} 1
apollo_router_cache_miss_time_sum{kind="query planner",service_name="apollo-router",storage="memory"} 0.000186783
apollo_router_cache_miss_time_count{kind="query planner",service_name="apollo-router",storage="memory"} 1
```

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
close #2330

Signed-off-by: Benjamin Coenen <5719034+bnjjj@users.noreply.github.com>
No behavior or public API should change, only internal imports

For apollo-complier integration I might need to move many things around
in the `spec` module. This will allow more easily finding its users.

---
name: Checklist
about: PR Acceptance Checklist.
title: ''
labels: []
assignees: ''

---

**Checklist**

Complete the checklist (and note appropriate exceptions) before a final
PR is raised.

- [x] Changes are compatible<sup>1</sup>
- [ ] Documentation<sup>2</sup> completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing<sup>3</sup>
    - [ ] Unit Tests
    - [ ] Integration Tests
    - [ ] Manual Tests

**Exceptions**

Nothing new to test or document. Perf should be unaffected.

**Notes**

1. It may be appropriate to bring upcoming changes to the attention of
other (impacted) groups. Please endeavour to do this before seeking PR
approval. The mechanism for doing this will vary considerably, so use
your judgement as to how and when to do this.
2. Configuration is an important part of many changes. Where applicable
please try to document configuration examples.
3. Tick whichever testing boxes are applicable. If you are adding Manual
Tests:
- please document the manual testing (extensively) in the Exceptions.
- please raise a separate issue to automate the test and label it (or
ask for it to be labeled) as `manual test`
- remove unused metadata
- add description and related issue
- use footnotes
abernix and others added 2 commits January 11, 2023 13:49
Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>
CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Bryn Cooke <BrynCooke@gmail.com>
CHANGELOG.md Outdated Show resolved Hide resolved
abernix and others added 5 commits January 11, 2023 16:11
Co-authored-by: Gary Pennington <gary@apollographql.com>
…ler` (#2382)

This PR reverses the choice made in
#2356 which came up
originally in #2339. By
default, we would like the user experience to be that Apollo Studio
tracing works at a low sampling rate when you've provided an Apollo
Studio API key. To disable field level tracing (FTV1) entirely, the rate
can be set to `0.0`.

Co-authored-by: Jesse Rosenberger <git@jro.cc>
@abernix abernix requested a review from garypen January 11, 2023 15:37
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the helm edit has gone into the wrong section in the CHANGELOG.md

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Jeremy Lempereur <jeremy.lempereur@iomentum.com>
@abernix abernix merged commit 5c33aff into main Jan 11, 2023
@abernix abernix deleted the 1.8.0 branch January 11, 2023 17:16
abernix added a commit that referenced this pull request Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants