Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM when generating binlog for boost.hana #3577

Closed
xiangfan-ms opened this issue Aug 3, 2018 · 12 comments
Closed

OOM when generating binlog for boost.hana #3577

xiangfan-ms opened this issue Aug 3, 2018 · 12 comments
Assignees
Labels
Area: Debuggability Issues impacting the diagnosability of builds, including logging and clearer error messages. Area: Logging performance triaged
Milestone

Comments

@xiangfan-ms
Copy link

xiangfan-ms commented Aug 3, 2018

It reproes using 15.8 preview 6 (I didn't try older versions).
Microsoft (R) Build Engine version 15.8.166+gd4e8d81a88 for .NET Framework

  1. Clone https://github.com/xiangfan-ms/hana
  2. Create build folder and enter build folder
  3. Generate the solution (-DBOOST_ROOT may be optional to repro OOM)
cmake .. -DBOOST_ROOT=[path_to_boost_167] -D"CMAKE_CXX_FLAGS=/EHsc /Zc:ternary /bigobj /we4667" -DBOOST_HANA_ENABLE_STRING_UDL=false
  1. Build the solution
msbuild /noconlog /t:Rebuild /p:Configuration=Debug /p:Platform="Win32" /m /bl:hana.binlog;ProjectImports=None Boost.Hana.sln /flp:v=n;logfile=hana.log

Unhandled Exception: OutOfMemoryException.

It doesn't OOM if I don't generate binlog. According to the log file, the OOM happens at the very early stage of the build.

@xiangfan-ms
Copy link
Author

xiangfan-ms commented Aug 3, 2018

Call stack,

>	Microsoft.Build.dll!Microsoft.Build.InterningBinaryReader.ReadString()	Unknown
 	Microsoft.Build.Framework.dll!Microsoft.Build.Framework.BuildEventArgs.CreateFromStream(System.IO.BinaryReader reader, int version)	Unknown
 	Microsoft.Build.Framework.dll!Microsoft.Build.Framework.LazyFormattedBuildEventArgs.CreateFromStream(System.IO.BinaryReader reader, int version)	Unknown
 	Microsoft.Build.Framework.dll!Microsoft.Build.Framework.BuildMessageEventArgs.CreateFromStream(System.IO.BinaryReader reader, int version)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.Shared.LogMessagePacketBase.ReadFromStream(Microsoft.Build.BackEnd.INodePacketTranslator translator)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.Shared.LogMessagePacketBase.Translate(Microsoft.Build.BackEnd.INodePacketTranslator translator)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.LogMessagePacket.FactoryForDeserialization(Microsoft.Build.BackEnd.INodePacketTranslator translator)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodePacketFactory.PacketFactoryRecord.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.INodePacketTranslator translator)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodePacketFactory.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.NodePacketType packetType, Microsoft.Build.BackEnd.INodePacketTranslator translator)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeManager.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.NodePacketType packetType, Microsoft.Build.BackEnd.INodePacketTranslator translator)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.NodeContext.ReadAndRoutePacket(Microsoft.Build.BackEnd.NodePacketType packetType, byte[] packetData, int packetLength)	Unknown
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.NodeContext.BodyReadComplete(System.IAsyncResult result)	Unknown

@rainersigwald
Copy link
Member

Is the OOM node the entry-point or one of the workers? Can you also try without /m?

@rainersigwald rainersigwald added Area: Debuggability Issues impacting the diagnosability of builds, including logging and clearer error messages. performance labels Aug 3, 2018
@rainersigwald
Copy link
Member

On my machine, it was the entrypoint node, and it ran for a while without /m, suggesting less of a problem.

Which makes sense if that stack is actually representative of the problem, since it's in IPC string translation.

Conceivably related to #3210 but that shouldn't cause an OOM crash, just GC pauses.

@xiangfan-ms
Copy link
Author

Yes, it appears to work if I remove /m.
New call stack after I disable JMC (I don't have the text form because VS also hangs during the debugging :-) ):

image

@rainersigwald
Copy link
Member

rainersigwald commented Aug 3, 2018

Took some memory dumps with procdump.exe -n 3 -ma 29208 and compared them

From one dump:

Name Inc % Inc
Microsoft.Build.Framework!Microsoft.Build.Framework.BuildMessageEventArgs 74.2 870,234,752
+ LIB <<System.Threading.Tasks.Dataflow!Tasks.Dataflow.BufferBlock>> 74.2 870,234,752
|+ Microsoft.Build!Microsoft.Build.BackEnd.Logging.LoggingService 74.2 870,234,752
||+ Microsoft.Build!Microsoft.Build.BackEnd.Logging.NodeLoggingContext 74.2 870,234,752
|||+ Microsoft.Build!Microsoft.Build.BackEnd.InProcNode 74.2 870,234,752

So it seems like the logger is lagging, causing messages to pile up. We have a throttling mechanism, but it looks like it's only opted into sometimes. I'll try to debug in to see if the scenario that's overflowing it opts in (and the throttle is broken) or if it doesn't (and we need to opt it into throttling).

@rainersigwald
Copy link
Member

It's the latter:

0:084> !DumpObj /d 02daca48
Name:        System.Threading.Tasks.Dataflow.Internal.BoundingStateWithPostponedAndTask`1[[System.Object, mscorlib]]
MethodTable: 05334b68
EEClass:     57c423a0
Size:        28(0x1c) bytes
File:        C:\Program Files (x86)\Microsoft Visual Studio\Preview\Enterprise\MSBuild\15.0\Bin\System.Threading.Tasks.Dataflow.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
648cf2dc  400014b        4         System.Int32  1 instance   200000 BoundedCapacity
648cf2dc  400014c        8         System.Int32  1 instance   199999 CurrentCount
00000000  400014d        c                       0 instance 02daca64 PostponedMessages
648cf2dc  400014e       10         System.Int32  1 instance        0 OutstandingTransfers
64915c7c  400014f       14 ...eading.Tasks.Task  0 instance bcbb4fe0 TaskForInputProcessing

Note that we're at capacity . . . but events keep getting asynchronously added to the queue via the in-proc node, like so:

>	Microsoft.Build.dll!Microsoft.Build.BackEnd.Logging.LoggingService.ProcessLoggingEvent(object buildEvent, bool allowThrottling) Line 1076	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.Logging.LoggingService.LogCommentFromText(Microsoft.Build.Framework.BuildEventContext buildEventContext, Microsoft.Build.Framework.MessageImportance importance, string message, object[] messageArgs) Line 98	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.Logging.LoggingService.LogCommentFromText(Microsoft.Build.Framework.BuildEventContext buildEventContext, Microsoft.Build.Framework.MessageImportance importance, string message) Line 65	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.Logging.LoggingContext.LogCommentFromText(Microsoft.Build.Framework.MessageImportance importance, string message) Line 130	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskExecutionHost.InternalSetTaskParameter(Microsoft.Build.Framework.TaskPropertyInfo parameter, object parameterValue) Line 1319	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskExecutionHost.SetValueParameter(Microsoft.Build.Framework.TaskPropertyInfo parameter, System.Type parameterType, string expandedParameterValue) Line 712	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskExecutionHost.InitializeTaskScalarParameter(Microsoft.Build.Framework.TaskPropertyInfo parameter, System.Type parameterType, string parameterValue, Microsoft.Build.Construction.ElementLocation parameterLocation, out bool taskParameterSet) Line 1190	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskExecutionHost.SetTaskParameter(string parameterName, string parameterValue, Microsoft.Build.Construction.ElementLocation parameterLocation, bool isRequired, out bool parameterSet) Line 1046	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskExecutionHost.Microsoft.Build.BackEnd.ITaskExecutionHost.SetTaskParameters(System.Collections.Generic.IDictionary<string, System.Tuple<string, Microsoft.Build.Construction.ElementLocation>> parameters) Line 336	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskBuilder.ExecuteInstantiatedTask(Microsoft.Build.BackEnd.ITaskExecutionHost taskExecutionHost, Microsoft.Build.BackEnd.Logging.TaskLoggingContext taskLoggingContext, Microsoft.Build.BackEnd.TaskHost taskHost, Microsoft.Build.BackEnd.ItemBucket bucket, Microsoft.Build.BackEnd.TaskExecutionMode howToExecuteTask) Line 742	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskBuilder.InitializeAndExecuteTask(Microsoft.Build.BackEnd.Logging.TaskLoggingContext taskLoggingContext, Microsoft.Build.BackEnd.ItemBucket bucket, System.Collections.Generic.IDictionary<string, string> taskIdentityParameters, Microsoft.Build.BackEnd.TaskHost taskHost, Microsoft.Build.BackEnd.TaskExecutionMode howToExecuteTask) Line 656	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskBuilder.ExecuteBucket(Microsoft.Build.BackEnd.TaskHost taskHost, Microsoft.Build.BackEnd.ItemBucket bucket, Microsoft.Build.BackEnd.TaskExecutionMode howToExecuteTask, System.Collections.Generic.Dictionary<string, string> lookupHash) Line 431	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskBuilder.ExecuteTask(Microsoft.Build.BackEnd.TaskExecutionMode mode, Microsoft.Build.BackEnd.Lookup lookup) Line 328	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TaskBuilder.ExecuteTask(Microsoft.Build.BackEnd.Logging.TargetLoggingContext loggingContext, Microsoft.Build.BackEnd.BuildRequestEntry requestEntry, Microsoft.Build.BackEnd.ITargetBuilderCallback targetBuilderCallback, Microsoft.Build.Execution.ProjectTargetInstanceChild taskInstance, Microsoft.Build.BackEnd.TaskExecutionMode mode, Microsoft.Build.BackEnd.Lookup inferLookup, Microsoft.Build.BackEnd.Lookup executeLookup, System.Threading.CancellationToken cancellationToken) Line 177	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TargetEntry.ProcessBucket(Microsoft.Build.BackEnd.ITaskBuilder taskBuilder, Microsoft.Build.BackEnd.Logging.TargetLoggingContext targetLoggingContext, Microsoft.Build.BackEnd.TaskExecutionMode mode, Microsoft.Build.BackEnd.Lookup lookupForInference, Microsoft.Build.BackEnd.Lookup lookupForExecution) Line 834	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TargetEntry.ExecuteTarget(Microsoft.Build.BackEnd.ITaskBuilder taskBuilder, Microsoft.Build.BackEnd.BuildRequestEntry requestEntry, Microsoft.Build.BackEnd.Logging.ProjectLoggingContext projectLoggingContext, System.Threading.CancellationToken cancellationToken) Line 508	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TargetBuilder.ProcessTargetStack(Microsoft.Build.BackEnd.ITaskBuilder taskBuilder) Line 462	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.TargetBuilder.BuildTargets(Microsoft.Build.BackEnd.Logging.ProjectLoggingContext loggingContext, Microsoft.Build.BackEnd.BuildRequestEntry entry, Microsoft.Build.BackEnd.IRequestBuilderCallback callback, string[] targetNames, Microsoft.Build.BackEnd.Lookup baseLookup, System.Threading.CancellationToken cancellationToken) Line 165	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.RequestBuilder.BuildProject() Line 1120	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.RequestBuilder.BuildAndReport() Line 747	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.RequestBuilder.RequestThreadProc(bool setThreadParameters) Line 694	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.RequestBuilder.StartBuilderThread.AnonymousMethod__45_2() Line 631	C#
 	mscorlib.dll!System.Threading.Tasks.Task<System.Threading.Tasks.Task>.InnerInvoke() Line 680	C#
 	mscorlib.dll!System.Threading.Tasks.Task.Execute() Line 2498	C#
 	mscorlib.dll!System.Threading.Tasks.Task.ExecutionContextCallback(object obj) Line 2861	C#
 	mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 954	C#
 	mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 902	C#
 	mscorlib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot) Line 2827	C#
 	mscorlib.dll!System.Threading.Tasks.Task.ExecuteEntry(bool bPreventDoubleExecution) Line 2756	C#
 	mscorlib.dll!System.Threading.Tasks.TaskScheduler.TryExecuteTask(System.Threading.Tasks.Task task) Line 458	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.RequestBuilder.DedicatedThreadsTaskScheduler.InjectThread.AnonymousMethod__6_0() Line 1378	C#
 	mscorlib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state) Line 74	C#
 	mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 954	C#
 	mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Line 902	C#
 	mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 891	C#
 	mscorlib.dll!System.Threading.ThreadHelper.ThreadStart() Line 111	C#

@xiangfan-ms
Copy link
Author

Why are there so many events (I assume there are more than 199999 events based on your previous reply)? There are ~1000 projects, but each project often contains just one file and involves only compile and link (and most of them haven't reached the point of compilation yet when OOM happens).

@rainersigwald
Copy link
Member

The binary log captures all events, so a log of arbitrary fidelity can be replayed out of it. So it's expected to have a bajillion events, but if the writer can't keep up we should just slow down the build, not crash.

@xiangfan-ms
Copy link
Author

I see.
One concern is that if generating binlog causes observable overhead (like slowing down the build), it may introduce misinterpretation when I analyze binlog to understand my build.

Maybe we want to output the waiting time (if it is noticeable compared with the total build time) due to throttling for informational purpose?

rainersigwald added a commit to rainersigwald/msbuild that referenced this issue Aug 3, 2018
Fixes dotnet#3577 by applying the throttling policy to the processing of all
logging events, not just those from other nodes.

While this will slow the build down, it will keep the memory usage of
log events in the to-be-processed queue bounded, preventing the OOM
reported in the bug.
rainersigwald added a commit to rainersigwald/msbuild that referenced this issue Aug 3, 2018
Fixes dotnet#3577 by applying the throttling policy to the processing of all
logging events, not just those from other nodes.

While this will slow the build down, it will keep the memory usage of
log events in the to-be-processed queue bounded, preventing the OOM
reported in the bug.
@rainersigwald
Copy link
Member

First attempt at solution (always throttle the log events) resulted in a deadlock:

image

But I'm not sure where the lock is held.

@rainersigwald
Copy link
Member

Ah, got it:
image

BuildManager.ProcessPacket holds BuildManager._syncLock and is trying to log that a project has started, but can't because the log-event queue is full. Then, later, the logging queue processor hits a ProjectFinishedEventArgs and tries to raise the OnProjectFinished event in response, which tries to take BuildManager._syncLock again, deadlocking the system.

@KirillOsenkov
Copy link
Member

This should be mostly mitigated by #6155

Please open a new issue if you're still seeing problems after MSBuild 16.10.

@ladipro ladipro added this to the MSBuild 16.10 milestone Dec 9, 2021
Forgind pushed a commit that referenced this issue Apr 26, 2022
Fixes #7364

Context
LoggingService.LogComment causes large amounts of contention between unrelated evaluation threads.
image

Changes Made
Reducing lock statements
using Interlock.Add as oppose to lock { n+=x; }
replace Dictionary with ConcurrentDictionary and remove related locks
review locks and remove unnecessary
Replacing DataFlow by ConcurrentQueue for event processing
Fixing unit tests
Testing
local run unit testst
micro benchmark comparing DataFlow by ConcurrentQueue
exp insertion - currently failing on infrastructure - will update
Compiled Boost.Hana as described in OOM when generating binlog for boost.hana #3577 - no OOM exception
Notes
It has surprisingly good perf results. Maybe because I tested it on 24 core machine where lock contention in LoggingService could be a bigger issue.

command	Duration	RAM
msbuild /m /bl OrchardCore.sln	-25%	-4%
msbuild /m OrchardCore.sln	-13%	-3%
Microbenchmark comparing DataFlow vs ConcurrentQueue processing 1E6 messages showed ~2 seconds saving (-85%) and about ~250MB allocations less (-92%). I was also considering to use BlockingCollection but for this we do not have support in net3.5 and also I have seen weird huge perf degradation for my use case in net472.

Method	Mean	Allocated
DataFlow	2,558,860.6 us	297,286,352 B
CocurrentQueue	371,877.9 us	26,304,552 B
@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Debuggability Issues impacting the diagnosability of builds, including logging and clearer error messages. Area: Logging performance triaged
Projects
None yet
Development

No branches or pull requests

5 participants