Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address leak when using request stream interceptors (#25449) #27571

Merged

Conversation

bostikforever
Copy link
Contributor

@bostikforever bostikforever commented Oct 3, 2021

This addresses the leak demonstrated in #25449

The root cause seems to be the creation of lots of unresolved asyncio Tasks because of how the call status code is used to write to the interceptor request stream in an interruptible manner.

The first attempt that (sort of) worked is just to cancel whatever pending calls that are not required: ae3f88e

However, while this slows the rate of leakage, it does not quite solve the problem completely.
It turns out that making many calls to the status code itself is a problem, as this creates many Futures in the cython layer which persist even after the Tasks that create them are cancelled (see:

async def status(self):
"""Returns the status of the RPC call.
It returns the finshed status of the RPC. If the RPC
has not finished yet this function will wait until the RPC
gets finished.
Returns:
Finished status of the RPC as an AioRpcStatus object.
"""
if self._status is not None:
return self._status
future = self._loop.create_future()
self._waiters_status.append(future)
await future
return self._status
)
While Future objects seem to be cheaper than the Task objects, creating a lot of them still adds up over time.
For the purposes of the interceptor request stream, it seems that it is sufficient to just have a single status code task that can be "listened" on for the lifetime of the interceptor request stream. This way we avoid polluting the callback queue of the corresponding cython call object: c8801bb

@lidizheng

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 3, 2021

CLA Signed

The committers are authorized under a signed CLA.

@bostikforever bostikforever changed the base branch from v1.41.x to master October 3, 2021 23:47
@bostikforever bostikforever force-pushed the bo/fix-interceptor-stream-leak-#25449 branch from c848b61 to 80f6aec Compare October 3, 2021 23:48
@bostikforever bostikforever marked this pull request as draft October 3, 2021 23:52
@bostikforever bostikforever force-pushed the bo/fix-interceptor-stream-leak-#25449 branch from 80f6aec to 1147f3c Compare October 4, 2021 00:04
@bostikforever bostikforever reopened this Oct 4, 2021
@bostikforever bostikforever marked this pull request as ready for review October 4, 2021 01:18
@lidizheng lidizheng self-requested a review October 4, 2021 17:43
@lidizheng lidizheng self-assigned this Oct 4, 2021
@lidizheng lidizheng added kokoro:run lang/Python release notes: yes Indicates if PR needs to be in release notes labels Oct 4, 2021
@bostikforever
Copy link
Contributor Author

I see some builds are failing, looking at other PRs though (including the merged ones) it seems that this is "normal". Is there any thing for me to do?

@bostikforever bostikforever force-pushed the bo/fix-interceptor-stream-leak-#25449 branch from 3b103d4 to f46f3d9 Compare October 12, 2021 18:05
@bostikforever
Copy link
Contributor Author

I have rebased on master and resolved the merge conflict that was preventing merge.
@lidizheng

The results of these pending tasks are not needed, leaving them
on the queue grows the size of the queue until the call completes.

This fix slows the growth of the memory in the test example.
Cancelling unneeded Tasks is not sufficient as this leaves behind
cancelled Futures in the cygrpc layer, which still occupy memory.

Instead, avoid creating unneeded tasks in the first place.
@bostikforever bostikforever force-pushed the bo/fix-interceptor-stream-leak-#25449 branch from d922ae9 to c8801bb Compare October 18, 2021 02:41
@bostikforever
Copy link
Contributor Author

@lidizheng Fixed typo causing build to fail.

@bostikforever
Copy link
Contributor Author

@lidizheng I do not think the current failing tests have anything to do with my changes, can you please confirm.

Copy link
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. This is a hard problem to investigate and solve. Thanks so much for fixing it. I think the fix and comment looks good. Just a minor comment, and please check our Sanity Tests' output, as following:


+ yapf_virtual_environment/bin/python -m yapf --diff --parallel --recursive --style=setup.cfg examples src test tools setup.py
--- src/python/grpcio/grpc/aio/_interceptor.py	(original)
+++ src/python/grpcio/grpc/aio/_interceptor.py	(reformatted)
@@ -505,8 +505,8 @@
                 break
             yield value

-    async def _write_to_iterator_queue_interruptible(
-            self, request: RequestType, call: InterceptedCall):
+    async def _write_to_iterator_queue_interruptible(self, request: RequestType,
+                                                     call: InterceptedCall):
         # Write the specified 'request' to the request iterator queue using the
         # specified 'call' to allow for interruption of the write in the case
         # of abrupt termination of the call.
@@ -516,8 +516,7 @@
         _, _ = await asyncio.wait(
             (self._loop.create_task(self._write_to_iterator_queue.put(request)),
              self._status_code_task),
-            return_when=asyncio.FIRST_COMPLETED
-        )
+            return_when=asyncio.FIRST_COMPLETED)

     async def write(self, request: RequestType) -> None:
         # If no queue was created it means that requests

src/python/grpcio/grpc/aio/_interceptor.py Outdated Show resolved Hide resolved
1. Ignore unused return values
2. Fix formatting
@bostikforever
Copy link
Contributor Author

Addressed the formatting fix and your nit.

Copy link
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@lidizheng lidizheng merged commit c1d4e96 into grpc:master Oct 19, 2021
@bostikforever
Copy link
Contributor Author

@lidizheng Thanks for accepting. What version to look out for this fix in?

@lidizheng
Copy link
Contributor

We will have a branch cut tomorrow for v1.42.0. It will be available in release candidate soon, like within the week. The official release might take longer, as there might be release blockers.

@ctiller
Copy link
Member

ctiller commented Oct 19, 2021

lgtm

@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Oct 19, 2021
@bostikforever bostikforever deleted the bo/fix-interceptor-stream-leak-#25449 branch November 21, 2021 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported Specifies if the PR has been imported to the internal repository lang/Python release notes: yes Indicates if PR needs to be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants