Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce GC pressure in InterningBinaryReader.ReadString() #3210

Closed
lifengl opened this issue Apr 18, 2018 · 13 comments
Closed

Reduce GC pressure in InterningBinaryReader.ReadString() #3210

lifengl opened this issue Apr 18, 2018 · 13 comments
Labels
CPS Issues impacting CPS performance Performance-Scenario-Solution-Open This issue affects solution open performance. triaged
Milestone

Comments

@lifengl
Copy link

lifengl commented Apr 18, 2018

From trace when we open a .Net Core solution, we noticed that more than 10% virtual allocation was triggered by creating the StringBuilder in Microsoft.Build.InteringBinaryReader.ReadString()

It appears a part of msbuild code has already tried to share stringBuffer. Can we do the same thing here? It could reduce GC press in the design time build phase right after loading a solution.

If we count the phase right after loading solution, this code path contributes 26.8% of virtual allocations. Although the memory allocation is temporary, it adds lot of GC pressure during that phase.

@lifengl
Copy link
Author

lifengl commented Apr 19, 2018

Traces shows this stack

|     + microsoft.build.ni!InterningBinaryReader.ReadString

|      + microsoft.build.ni!Microsoft.Build.BackEnd.NodePacketTranslator+NodePacketReadTranslator.Translate(System.String ByRef)

alone with a few other stacks:

Name

|    |  + mscorlib.ni!BinaryWriter.Write
|    |   + microsoft.build.ni!Microsoft.Build.BackEnd.NodePacketTranslator+NodePacketWriteTranslator.Translate(System.String ByRef)

|    + microsoft.build.ni!Microsoft.Build.BackEnd.NodeProviderOutOfProcBase+NodeContext.HeaderReadComplete(System.IAsyncResult)

Name

|     + microsoft.build.ni!InterningBinaryReader.ReadString
|      + microsoft.build.framework.ni!ProjectStartedEventArgs.CreateFromStream
|      + microsoft.build.ni!Microsoft.Build.BackEnd.NodePacketTranslator+NodePacketReadTranslator.Translate(System.String ByRef)

Are the major contributor to the LOH heap during the DT build time after loading a large solution. That happens when the size of the solution is bigger to the point, that the solution configuration xml string becomes very bigger, so all build packet buffers are all beyond the threshold and pushed to the LOH. There are also large amount of DT build for large solutions, and maybe multiple build nodes also contributes to this. The code path above quickly allocates lots of memory, and due to LOH, they are not recycled until Gen2 GC. This creates a huge memory pressure (in one dump, the LOH grow to 600M), and combine with other allocations, it causes VS to run out of memory.

@rainersigwald
Copy link
Member

Looks like our ReusableStringBuilder isn't a great solution for this for a couple of reasons:

  • It's designed to keep only one buffer in its pool (probably not the end of the world here since I/O is serialized by the pipe)
  • It doesn't reuse the StringBuilder if its capacity has grown past 1024

https://github.com/Microsoft/msbuild/blob/e8b480c373eadd9daa975c6e96e4dbbabcd9a4fe/src/Shared/ReuseableStringBuilder.cs#L226-L231

That number is obviously way too small if things were hitting the LOH. Do you have a picture of how big the solution configuration that entered the LOH was? I share the original concern about the pool mechanism keeping buffers that are way too big around forever since it's static.

@rainersigwald
Copy link
Member

Hmm, maybe the ThreadStatic approach in coreclr's StringBuilderCache would be good enough for us:

https://github.com/dotnet/coreclr/blob/68f72dd2587c3365a9fe74d1991f93612c3bc62a/src/mscorlib/src/System/Text/StringBuilderCache.cs#L41

Though I notice they use an even smaller size limit there.

@cdmihai
Copy link
Contributor

cdmihai commented Apr 20, 2018

Talking to @lifengl, it turns there is another source of large object heap assignments besides the StringBuilder: the buffers used to read from (and probably also write to) the named pipe streams:

https://github.com/Microsoft/msbuild/blob/master/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs#L973-L982

https://github.com/Microsoft/msbuild/blob/master/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs#L759-L768

One option here would be to use a pool of buffers, with similar policies like the StringBuilder, but some differences:

  • if multiple requests come concurrently, then the pool will need to create new buffers
  • when buffers are returned, they can be discarded if another larger buffer is free in the pool

But the solution is highly dependent on the parallelism taking place there, which I am not very familiar with. There is certainly one instance per node, but I can't tell if its one thread processing the pipes for all nodes, one thread per node, or multiple threads per node.

@rainersigwald
Copy link
Member

So we've got a reused byte[1000] there. We could use System.Buffers.ArrayPool.Shared to manage most of the problems you're asking about, but it doesn't solve the "keeping a (needed) giant buffer around forever in a static cache, bloating memory footprint (but saving GC pressure)" problem :(

@lifengl
Copy link
Author

lifengl commented Apr 21, 2018 via email

@Mohit-Chakraborty
Copy link

I am running into OOM exception with the Roslyn solution -

KERNELBASE.dll!RaiseException(unsigned long dwExceptionCode, unsigned long dwExceptionFlags, unsigned long nNumberOfArguments, const unsigned long * lpArguments) Line 922 C
[Managed to Native Transition]
Microsoft.Build.dll!Microsoft.Build.InterningBinaryReader.ReadString() Line 172 C#
Microsoft.Build.Framework.dll!Microsoft.Build.Framework.ProjectStartedEventArgs.CreateFromStream(System.IO.BinaryReader reader, int version) Line 469 C#
Microsoft.Build.dll!Microsoft.Build.Shared.LogMessagePacketBase.ReadFromStream(Microsoft.Build.BackEnd.INodePacketTranslator translator) Line 369 C#
Microsoft.Build.dll!Microsoft.Build.Shared.LogMessagePacketBase.Translate(Microsoft.Build.BackEnd.INodePacketTranslator translator) Line 272 C#
Microsoft.Build.dll!Microsoft.Build.BackEnd.LogMessagePacket.FactoryForDeserialization(Microsoft.Build.BackEnd.INodePacketTranslator translator) Line 52 C#
Microsoft.Build.dll!Microsoft.Build.BackEnd.NodePacketFactory.PacketFactoryRecord.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.INodePacketTranslator translator) Line 108 C#
Microsoft.Build.dll!Microsoft.Build.BackEnd.NodePacketFactory.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.NodePacketType packetType, Microsoft.Build.BackEnd.INodePacketTranslator translator) Line 66 C#
Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeManager.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.NodePacketType packetType, Microsoft.Build.BackEnd.INodePacketTranslator translator) Line 282 C#
Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.NodeContext.ReadAndRoutePacket(Microsoft.Build.BackEnd.NodePacketType packetType, byte[] packetData, int packetLength) Line 1017 C#
Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.NodeContext.BodyReadComplete(System.IAsyncResult result) Line 1071 C#

@lifengl
Copy link
Author

lifengl commented May 1, 2018 via email

@lifengl
Copy link
Author

lifengl commented May 10, 2018

We noticed this issue leads the product to crash when we turn on diagnostic log.

@lifengl
Copy link
Author

lifengl commented May 21, 2018

After thinking about it, I think the right way to hold and reuse a large buffer landing in LOH:

we should only use a weakReference to hold the large buffer we want to reuse/recycle, and convert it to a solid object when we need reuse it.

1, The LOH buffer is only recycled during Gen2 GC, so we can expect to reuse a buffer for a long time during normal product phase.

2, Technically, we don't really hold any extra memory, because Gen2 GC still can reclaim everything. We only reuse the memory currently wasted in the space.

@rainersigwald
Copy link
Member

That's a really good idea! We may have to play some tricks with our Mono implementation, if I correctly recall some of our other weak-reference problems.

@cdmihai cdmihai added the CPS Issues impacting CPS label Jun 21, 2018
@panopticoncentral panopticoncentral added the Performance-Scenario-Solution-Open This issue affects solution open performance. label Mar 31, 2020
@KirillOsenkov
Copy link
Member

I'm seeing a ton of allocations of ReusableStringBuilder here during evaluation:

using (var result = new ReuseableStringBuilder())

I'm noticing ReusableStringBuilder is a class and it's an overkill to allocate an instance every time. Could this be a struct or a static helper method?

@ladipro
Copy link
Member

ladipro commented Mar 2, 2021

This one can be closed.

@ladipro ladipro closed this as completed Mar 2, 2021
@ladipro ladipro added this to the MSBuild 16.10 milestone Dec 9, 2021
@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CPS Issues impacting CPS performance Performance-Scenario-Solution-Open This issue affects solution open performance. triaged
Projects
None yet
Development

No branches or pull requests

8 participants