Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak in ApolloServerPluginUsageReporting #6983

Closed
willalbone opened this issue Oct 3, 2022 · 5 comments · Fixed by #7000
Closed

Possible memory leak in ApolloServerPluginUsageReporting #6983

willalbone opened this issue Oct 3, 2022 · 5 comments · Fixed by #7000

Comments

@willalbone
Copy link

Package: apollo-server-core@^3.10.2

We are running a federated gateway with an instance of the ApolloServerPluginUsageReporting. Over time we are seeing heap memory of these servers steadily increasing. As of 30 Sept we changed the fieldLevelInstrumentation option passed to the plugin from the default 100% down to 10%. This has seen a decrease in the amount of memory being used by our federated gateway however it is continuing to increase over time.

I believe there is an issue with the getReportData function used internally within the usage reporting plugin

const getReportData = (executableSchemaId: string): ReportData => {

The function appears to be storing ReportData objects in a plain object Record<string, ReportData> but does not seem to ever remove entries from this object. I would expect these entries to be removed once the reports have been sent to Apollo Studio.

Apologies for the lack of a repro case, I have not been able to pull something together that is able to replicate what we are seeing in our production environments

@glasser
Copy link
Member

glasser commented Oct 3, 2022

I do agree that reportDataByExecutableSchemaId is a bit of a memory leak. The state of the relationship between Gateway and Server doesn't make it completely trivial to determine when we're done with a particular iteration of the schema (since in-flight operations can continue to execute for a while after the new schema shows up).

On the other hand, it seems like it shouldn't be a particularly large memory leak: since the Reports in the ReportData are reset every time reports are sent then I'd expect the leak to only be a relatively small data structure (a ReportHeader and an empty OurReport) for each time your schema is changed.

So I'd be happy to try to improve the plugin to remove things from the data structure instead of leaving them around indefinitely, but I am a bit curious to hear what the scale of the leak you're seeing is, in case this isn't the actual problem. Have you used heap profiling tools?

@willalbone
Copy link
Author

With the default fieldLevelInstrumentation: true we were seeing memory usage increasing by 40-90mb per day. Having dropped that down to fieldLevelInstrumentation: 0.1 the rate of increase has dropped to around 7-25mb per day.

In terms of isolating this leak, I have spent the last weeks/months stripping everything non-essential out of this service to the point that it now only contains the gateway server, usage reporting plugin, and some stateless jwt validation code. These are extremely locked down production servers so it is not going to be possible to generate heap dumps for profiling. It is also not really practical for us to replicate our entire federated graph locally and generate requests with enough variance and throughput to easily reproduce this behaviour.

@glasser
Copy link
Member

glasser commented Oct 6, 2022 via email

@willalbone
Copy link
Author

What version of Apollo Server are you currently using?

3.10.2

glasser added a commit that referenced this issue Oct 7, 2022
A data structure in the usage reporting plugin would get a new entry
each time the server's schema changed. Old entries would never be
deleted, though their values would be shrunk to a minimum size.

It seems that this was not good enough and is producing notable memory
leaks for some folks. So let's fix it!

The reason for the old behavior was to be really careful to never write
to a report that may have already been taken out of the map and sent.
Previously this was accomplished by having the main entry in the map
never change. Now this is accomplished by making sure that we never
write to the report any later than immediately (and synchronously) after
we pull it out of the map.

Fixes #6983.
glasser added a commit that referenced this issue Oct 7, 2022
A data structure in the usage reporting plugin would get a new entry
each time the server's schema changed. Old entries would never be
deleted, though their values would be shrunk to a minimum size.

It seems that this was not good enough and is producing notable memory
leaks for some folks. So let's fix it!

The reason for the old behavior was to be really careful to never write
to a report that may have already been taken out of the map and sent.
Previously this was accomplished by having the main entry in the map
never change. Now this is accomplished by making sure that we never
write to the report any later than immediately (and synchronously) after
we pull it out of the map.

Fixes #6983.
glasser added a commit that referenced this issue Oct 7, 2022
A data structure in the usage reporting plugin would get a new entry
each time the server's schema changed. Old entries would never be
deleted, though their values would be shrunk to a minimum size.

It seems that this was not good enough and is producing notable memory
leaks for some folks. So let's fix it!

The reason for the old behavior was to be really careful to never write
to a report that may have already been taken out of the map and sent.
Previously this was accomplished by having the main entry in the map
never change. Now this is accomplished by making sure that we never
write to the report any later than immediately (and synchronously) after
we pull it out of the map.

Fixes #6983.
glasser added a commit that referenced this issue Oct 7, 2022
A data structure in the usage reporting plugin would get a new entry
each time the server's schema changed. Old entries would never be
deleted, though their values would be shrunk to a minimum size.

It seems that this was not good enough and is producing notable memory
leaks for some folks. So let's fix it!

The reason for the old behavior was to be really careful to never write
to a report that may have already been taken out of the map and sent.
Previously this was accomplished by having the main entry in the map
never change. Now this is accomplished by making sure that we never
write to the report any later than immediately (and synchronously) after
we pull it out of the map.

Fixes #6983. Backport from #6998.
glasser pushed a commit that referenced this issue Oct 7, 2022
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to version-4, this PR
will be updated.

⚠️⚠️⚠️⚠️⚠️⚠️

`version-4` is currently in **pre mode** so this branch has prereleases
rather than normal releases. If you want to exit prereleases, run
`changeset pre exit` on `version-4`.

⚠️⚠️⚠️⚠️⚠️⚠️

# Releases
## @apollo/server-integration-testsuite@4.0.0-rc.17

### Patch Changes

- Updated dependencies
\[[`233b44eea`](233b44e)]:
    -   @apollo/server@4.0.0-rc.17

## @apollo/server@4.0.0-rc.17

### Patch Changes

- [#6998](#6998)
[`233b44eea`](233b44e)
Thanks [@glasser](https://github.com/glasser)! - Fix a slow memory leak
in the usage reporting plugin (#6983).

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
glasser added a commit that referenced this issue Oct 7, 2022
A data structure in the usage reporting plugin would get a new entry
each time the server's schema changed. Old entries would never be
deleted, though their values would be shrunk to a minimum size.

It seems that this was not good enough and is producing notable memory
leaks for some folks. So let's fix it!

The reason for the old behavior was to be really careful to never write
to a report that may have already been taken out of the map and sent.
Previously this was accomplished by having the main entry in the map
never change. Now this is accomplished by making sure that we never
write to the report any later than immediately (and synchronously) after
we pull it out of the map.

Fixes #6983. Backport from #6998.
@glasser
Copy link
Member

glasser commented Oct 7, 2022

@willalbone Why don't you give v3.10.3 a shot and see if it helps.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants