New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM when generating binlog for boost.hana #3577
Comments
Call stack,
|
Is the OOM node the entry-point or one of the workers? Can you also try without |
On my machine, it was the entrypoint node, and it ran for a while without Which makes sense if that stack is actually representative of the problem, since it's in IPC string translation. Conceivably related to #3210 but that shouldn't cause an OOM crash, just GC pauses. |
Took some memory dumps with From one dump:
So it seems like the logger is lagging, causing messages to pile up. We have a throttling mechanism, but it looks like it's only opted into sometimes. I'll try to debug in to see if the scenario that's overflowing it opts in (and the throttle is broken) or if it doesn't (and we need to opt it into throttling). |
It's the latter:
Note that we're at capacity . . . but events keep getting asynchronously added to the queue via the in-proc node, like so:
|
Why are there so many events (I assume there are more than 199999 events based on your previous reply)? There are ~1000 projects, but each project often contains just one file and involves only compile and link (and most of them haven't reached the point of compilation yet when OOM happens). |
The binary log captures all events, so a log of arbitrary fidelity can be replayed out of it. So it's expected to have a bajillion events, but if the writer can't keep up we should just slow down the build, not crash. |
I see. Maybe we want to output the waiting time (if it is noticeable compared with the total build time) due to throttling for informational purpose? |
Fixes dotnet#3577 by applying the throttling policy to the processing of all logging events, not just those from other nodes. While this will slow the build down, it will keep the memory usage of log events in the to-be-processed queue bounded, preventing the OOM reported in the bug.
Fixes dotnet#3577 by applying the throttling policy to the processing of all logging events, not just those from other nodes. While this will slow the build down, it will keep the memory usage of log events in the to-be-processed queue bounded, preventing the OOM reported in the bug.
|
This should be mostly mitigated by #6155 Please open a new issue if you're still seeing problems after MSBuild 16.10. |
Fixes #7364 Context LoggingService.LogComment causes large amounts of contention between unrelated evaluation threads. image Changes Made Reducing lock statements using Interlock.Add as oppose to lock { n+=x; } replace Dictionary with ConcurrentDictionary and remove related locks review locks and remove unnecessary Replacing DataFlow by ConcurrentQueue for event processing Fixing unit tests Testing local run unit testst micro benchmark comparing DataFlow by ConcurrentQueue exp insertion - currently failing on infrastructure - will update Compiled Boost.Hana as described in OOM when generating binlog for boost.hana #3577 - no OOM exception Notes It has surprisingly good perf results. Maybe because I tested it on 24 core machine where lock contention in LoggingService could be a bigger issue. command Duration RAM msbuild /m /bl OrchardCore.sln -25% -4% msbuild /m OrchardCore.sln -13% -3% Microbenchmark comparing DataFlow vs ConcurrentQueue processing 1E6 messages showed ~2 seconds saving (-85%) and about ~250MB allocations less (-92%). I was also considering to use BlockingCollection but for this we do not have support in net3.5 and also I have seen weird huge perf degradation for my use case in net472. Method Mean Allocated DataFlow 2,558,860.6 us 297,286,352 B CocurrentQueue 371,877.9 us 26,304,552 B
It reproes using 15.8 preview 6 (I didn't try older versions).
Microsoft (R) Build Engine version 15.8.166+gd4e8d81a88 for .NET Framework
It doesn't OOM if I don't generate binlog. According to the log file, the OOM happens at the very early stage of the build.
The text was updated successfully, but these errors were encountered: