Log properties and items on ProjectEvaluationFinished #6287

KirillOsenkov · 2021-03-20T20:50:19Z

Add an option to log global properties, properties and items on ProjectEvaluationFinishedEventArgs instead of ProjectStartedEventArgs. This option is currently only turned on by the BinaryLogger.

This has several advantages. Currently only the projects that are built by the central node log their properties and items (properties are translated across nodes only if a special flag is set, and items are never translated). This resulted in properties and items not being available for projects built on other nodes. Now we log them after every evaluation and translate across nodes if needed. Together with the fact that we now log EvaluationId for each ProjectStarted, we can now recover properties and items for all project started events. This is the main purpose of this PR - to not lose properties and items like we currently do. We will still not log for project results that are satisfied by cache, because we don't keep track of evaluation for these. Presumably it will have already been logged previously.

In addition, if more than one project are built from the same evaluation, we do not duplicate properties and items, only logging them once. This results in logging more information, but storing it more efficiently. Together with string and dictionary deduplication we see very significant savings in binlog size and some reduction in build time.

This change has several large parts:

add a way to enumerate evaluation properties and items directly at the end of Evaluate() for PropertyDictionary and ItemDictionary
manual translation logic for ProjectEvaluationStarted and ProjectEvaluationFinished (instead of relying on TranslateDotNet/BinaryFormatter)
reading and writing ProjectEvaluationFinished GlobalProperties, Properties and Items in BuildEventArgsReader/Writer (used by BinaryLogger)
adding IEventSource4 with IncludeEvaluationPropertiesAndItems, to propagate this setting across nodes and threading it through the LoggingService
update the ParallelConsoleLogger and SerialConsoleLogger to print the new data, if present
tests

One controversial design decision here is storing a reference to live evaluation data in ProjectEvaluationFinishedEventArgs. It does not make a snapshot of the data to avoid very significant allocations. It does take the lock on the PropertyDictionary/ItemDictionary when enumerating, because logging is asynchronous and the logging consumer (BinaryLogger) will enumerate the data potentially after the build has already started and the data is being mutated. I did see exceptions when enumerating without the lock. We had the same problem when the data was logged on ProjectStartedEventArgs though. In addition, there's a slight risk of logging not the exact data as it was at the end of evaluation, but the mutated data after some target has modified it. However given that the previous behavior was to not log anything for out-of-proc projects, and given the very significant allocation reduction, I think it's worth it.

To mitigate, we could capture a snapshot at the end of evaluation, so we don't hold a reference to live data. This won't need a lock to enumerate. Ideally we also rely on the immutable collections to avoid allocations, but I didn't see an easy way to do that currently. We can investigate this in a future change.

For items, it doesn't concatenate items of different types into a single large item stream, but keeps multiple lists, one per item type, to reflect the internal representation. Not flattening item types results in savings because we don't have to mention the item type for each item.

This change increments the BinaryLogger file format to 12, to serialize GlobalProperties, Properties and Items on ProjectEvaluationFinishedEventArgs. It also stores items more efficiently, without having to know the number of item types in advance and enumerate in a single pass. It writes the item type and all items of that type, and it writes 0 to signal there are no more item types. It also no longer writes the Message for args as it can be recovered upon reading.

New EnumerateProperties() and EnumerateItems() methods are added to Utilities, to consolidate the logic to enumerate the new data structures in a single location, used by packet translation logic, binary logger and the console loggers.

Perf wise, I'm seeing no significant change on binlog size for small builds (it's a wash, because we log properties/items for all projects now, but they are no longer duplicated). For large projects I expect very significant savings though, as ProjectStarted is the most heavy-weight event in large binlogs.
Build performance with /bl on small-ish builds is improved 27 s -> 24 s for single-core and 18 s -> 17 s for parallel. No observable change without /bl.

Fixes
#5316
#3616

Add an option to log global properties, properties and items on ProjectEvaluationFinishedEventArgs instead of ProjectStartedEventArgs. This option is currently only turned on by the BinaryLogger. This has several advantages. Currently only the projects that are built by the central node log their properties and items (properties are translated across nodes only if a special flag is set, and items are never translated). This resulted in properties and items not being available for projects built on other nodes. Now we log them after every evaluation and translate across nodes if needed. Together with the fact that we now log EvaluationId for each ProjectStarted, we can now recover properties and items for all project started events. This is the main purpose of this PR - to not lose properties and items like we currently do. We will still not log for project results that are satisfied by cache, because we don't keep track of evaluation for these. Presumably it will have already been logged previously. In addition, if more than one project are built from the same evaluation, we do not duplicate properties and items, only logging them once. This results in logging more information, but storing it more efficiently. Together with string and dictionary deduplication we see very significant savings in binlog size and some reduction in build time. This change has several large parts: 1. add a way to enumerate evaluation properties and items directly at the end of Evaluate() for PropertyDictionary<ProjectPropertyInstance> and ItemDictionary<ProjectItemInstance> 2. manual translation logic for ProjectEvaluationStarted and ProjectEvaluationFinished (instead of relying on TranslateDotNet/BinaryFormatter) 3. reading and writing ProjectEvaluationFinished GlobalProperties, Properties and Items in BuildEventArgsReader/Writer (used by BinaryLogger) 4. adding IEventSource4 with IncludeEvaluationPropertiesAndItems, to propagate this setting across nodes and threading it through the LoggingService 5. update the ParallelConsoleLogger and SerialConsoleLogger to print the new data, if present 6. tests One controversial design decision here is storing a reference to live evaluation data in ProjectEvaluationFinishedEventArgs. It does not make a snapshot of the data to avoid very significant allocations. It does take the lock on the PropertyDictionary<T>/ItemDictionary<T> when enumerating, because logging is asynchronous and the logging consumer (BinaryLogger) will enumerate the data potentially after the build has already started and the data is being mutated. I did see exceptions when enumerating without the lock. We had the same problem when the data was logged on ProjectStartedEventArgs though. In addition, there's a slight risk of logging not the exact data as it was at the end of evaluation, but the mutated data after some target has modified it. However given that the previous behavior was to not log anything for out-of-proc projects, and given the very significant allocation reduction, I think it's worth it. To mitigate, we could capture a snapshot at the end of evaluation, so we don't hold a reference to live data. This won't need a lock to enumerate. Ideally we also rely on the immutable collections to avoid allocations, but I didn't see an easy way to do that currently. We can investigate this in a future change. For items, it doesn't concatenate items of different types into a single large item stream, but keeps multiple lists, one per item type, to reflect the internal representation. Not flattening item types results in savings because we don't have to mention the item type for each item. This change increments the BinaryLogger file format to 12, to serialize GlobalProperties, Properties and Items on ProjectEvaluationFinishedEventArgs. It also stores items more efficiently, without having to know the number of item types in advance and enumerate in a single pass. It writes the item type and all items of that type, and it writes 0 to signal there are no more item types. It also no longer writes the Message for args as it can be recovered upon reading. New EnumerateProperties() and EnumerateItems() methods are added to Utilities, to consolidate the logic to enumerate the new data structures in a single location, used by packet translation logic, binary logger and the console loggers. Perf wise, I'm seeing no significant change on binlog size for small builds (it's a wash, because we log properties/items for all projects now, but they are no longer duplicated). For large projects I expect very significant savings though, as ProjectStarted is the most heavy-weight event in large binlogs. Build performance with /bl on small-ish builds is improved 27 s -> 24 s for single-core and 18 s -> 17 s for parallel. No observable change without /bl.

KirillOsenkov · 2021-03-20T20:55:48Z

@mmitche it would be really good if you could take this one for a spin and get some stats on build perf and binlog size for a really large build.

You can use msbuild -bl -c Release /p:CreateBootstrap=true to produce an MSBuild layout and then use C:\MSBuild\artifacts\bin\bootstrap\net472\MSBuild\Current\Bin\MSBuild.exe to build.

KirillOsenkov · 2021-03-22T19:05:29Z

I have updated the binlog viewer for the latest format and added some affordances to make it easier to look up properties and items now that they're logged at evaluation and not under the project:

benvillalobos

One controversial design decision here is storing a reference to live evaluation data in ProjectEvaluationFinishedEventArgs.

Do you have a sense of the timing of builds between having this snapshot and just taking the lock? You mentioned seeing some exceptions, but I didn't get the sense that it happened every single time.

there's a slight risk of logging not the exact data as it was at the end of evaluation

Among this and the string de-duping causing potentially incorrect data (I seem to remember it was a small chance), should we have some tracking issue that shows what "things can potentially be invalid in the binlog"?

benvillalobos · 2021-03-23T15:58:07Z

ref/Microsoft.Build.Framework/netstandard/Microsoft.Build.Framework.cs

@@ -38,7 +38,7 @@ public abstract partial class BuildEventArgs : System.EventArgs
        public Microsoft.Build.Framework.BuildEventContext BuildEventContext { get { throw null; } set { } }
        public string HelpKeyword { get { throw null; } }
        public virtual string Message { get { throw null; } protected set { } }
-        protected System.DateTime RawTimestamp { get { throw null; } set { } }


What's the context behind this change?

I needed to be able to read and write the raw timestamp without reflection. protected wasn’t enough because I needed access from LogMessagePacketBase and binlog writer, so I’m adding protected internal. I was worried about exposing the field directly because who knows whether changing that field could break BinaryFormatter and I didn’t call it Timestamp because having identifiers that differ by case only would break CLS compliance (not sure how much we care).

This means it can be accessed freely within the assembly or by classes that extend it but not other classes? Is that right?

KirillOsenkov · 2021-03-23T16:38:54Z

Yes, the exceptions didn’t happen every time but I saw them at least three times, which is often enough.

Technically you’re right and we should keep track of the inaccuracies in binlog. I can add these to the wiki where I keep binlog stats. Keep in mind though that these pale in comparison to the status quo, which is most properties and items are simply not logged in multi-proc builds at all.

cdmihai · 2021-03-23T16:50:49Z

The properties / items logged in ProjectStarted events is not the evaluation state, but the current state of the mutated ProjectInstance. Several MSBuild tasks over the same ProjectInstance can mutate its state, and that state is what gets captured in ProjectStarted events, not the static evaluation state. Not showing this mutated state might make debugging harder. On the other hand it's probably good to log both though :) Or even better, do a diff and only log props/item that changed from evaluation, like added / deleted / changed.

<Project>
    <Target Name="Build">
        <MSBuild Projects="B.proj" Targets="AddFoo" />
        <MSBuild Projects="B.proj" Targets="AddZar" />
    </Target>
</Project>

<!-- B.proj -->
<Project>
    <ItemGroup>
        <CreatedInEvaluation Include="val" />
    </ItemGroup>

    <Target Name="AddFoo">
        <ItemGroup>
            <Foo_CreatedInATarget Include="bar"/>
        </ItemGroup>
    </Target>

    <Target Name="AddZar">
        <ItemGroup>
            <Zar_CreatedInATarget Include="tar"/>
        </ItemGroup>
    </Target>
</Project>

KirillOsenkov · 2021-03-23T18:18:08Z

But as far as I can tell all post-evaluation mutations should be logged via TaskParameterEventArgs, right? So technically we could stitch together the latest state. Still, a great observation.

KirillOsenkov · 2021-03-24T20:52:45Z

I got some stats for building Roslyn.

Binlog size:
16.9: 122MB
main: 14.7MB
PR 6287: 15.6MB

Against expectations, the binlog size is increased slightly with this PR, but at least we log all properties and items for each project, not just a small part. You can tell whether it has properties and items by the presence of the target framework label:

The build time seemed very close for this PR vs. main.

cdmihai · 2021-03-26T17:04:11Z

But as far as I can tell all post-evaluation mutations should be logged via TaskParameterEventArgs, right? So technically we could stitch together the latest state. Still, a great observation.

I would still leave both of them in (first evaluation state, and then the state at the beginning of each project build). Other loggers will not have the re-construction logic you mentioned above. If you write the reconstruction logic you can then change the binary logger to not include the eval state for project started events :). Regarding the concern of repeating mostly similar eval states, shouldn't the string dedup you previously added prevent that?

KirillOsenkov · 2021-03-26T17:58:39Z

I considered leaving them on ProjectStarted as well, however here are some concerns:

it is kind of pointless because they will only be preserved for the central node, as we currently don't have logic that translates them across nodes. As you see in the screenshot above, currently we only log properties and items for a small chunk of projects (roughly 1/8 for my 8 core machine).
there are many many ProjectStarted events, as each project is being built multiple times for many targets. Logging all this new state, and translating it across nodes will drastically slow down the build and increase the binlog size (dedup logic will be disrupted by slight differences)
the behavior change in this PR is only when BinaryLogger is present: https://github.com/dotnet/msbuild/pull/6287/files#diff-0864a6f2e10a87d935fbf19d81370f534e67f33931642dec1f30023b108e2cdcR142 so other loggers will get the old behavior if /bl is not specified.

We certainly could look into tracking changes made to the eval properties and items and only log the deltas for each project, but that deserves to be a separate PR. I think you can do it intelligently so each project start receives the current immutable snapshot of the data and walks back to the initial snapshot from evaluation. We can also make the initial logging at evaluation finished inline, and not delayed, but for that we need true immutable data structures at the core.

Basically I'm saying: there's certainly work to make it all better, this PR I believe vastly improves things without significant takebacks, so let's do it one step at a time ;)

But we can discuss more in person and hopefully we'll find a golden middle for this PR.

Forgind

My understanding of this PR is fairly shaky, but I think it does what you claimed. Thanks!

Forgind · 2021-03-29T20:25:07Z

ref/Microsoft.Build.Framework/netstandard/Microsoft.Build.Framework.cs

@@ -38,7 +38,7 @@ public abstract partial class BuildEventArgs : System.EventArgs
        public Microsoft.Build.Framework.BuildEventContext BuildEventContext { get { throw null; } set { } }
        public string HelpKeyword { get { throw null; } }
        public virtual string Message { get { throw null; } protected set { } }
-        protected System.DateTime RawTimestamp { get { throw null; } set { } }


This means it can be accessed freely within the assembly or by classes that extend it but not other classes? Is that right?

Forgind · 2021-03-29T20:29:17Z

src/Build.UnitTests/BackEnd/TranslationHelpers.cs

+
+        internal static string GetTaskItemString(object item)
+        {
+            var sb = new StringBuilder();


ReusableStringBuilder?

Tests, we don't care ;) I doubt there will be a difference on the test data of this size.

You're right; didn't notice.

Forgind · 2021-03-29T20:35:24Z

src/Build/BackEnd/Components/Logging/LoggingService.cs

+        /// </summary>
+        public bool IncludeEvaluationPropertiesAndItems
+        {
+            get => _includeEvaluationPropertiesAndItems ??= _eventSinkDictionary.Values.OfType<EventSourceSink>().Any(sink => sink.IncludeEvaluationPropertiesAndItems);


I see several uses of OfType. I was wondering whether we could cut out some iterations like:

Suggested change

get => _includeEvaluationPropertiesAndItems ??= _eventSinkDictionary.Values.OfType<EventSourceSink>().Any(sink => sink.IncludeEvaluationPropertiesAndItems);

get => _includeEvaluationPropertiesAndItems ??= _eventSinkDictionary.Values.Any(sink => sink is EventSourceSink esSink && esSink.IncludeEvaluationPropertiesAndItems);

and if that might make it faster. Doesn't seem like a particularly relevant performance problem, but I thought I'd mention it.

good idea in general, but this runs once per build on a list of like 2 items, and I just followed the existing pattern in that code (smart way to say "copy pasted")

Forgind · 2021-03-29T20:41:28Z

src/Build/BackEnd/Components/Logging/LoggingServiceLogMethods.cs

-                ProjectStartedEventArgs buildEvent = new ProjectStartedEventArgs
+
+                IDictionary<string, string> globalProperties = null;
+                if (!IncludeEvaluationPropertiesAndItems)


I'm probably misunderstanding this, but it looks backwards. If we aren't including evaluation properties and items, we should pass properties as a parameter? Isn't that including it?

It is a bit confusing.

if (!IncludeEvaluationPropertiesAndItems)
log stuff on ProjectStarted
else
log stuff on ProjectEvaluationFinished

src/Build/Collections/ItemDictionary.cs

Forgind · 2021-03-29T22:02:53Z

src/Build/Logging/BinaryLogger/BuildEventArgsWriter.cs


-            Write(groups.Length);
+                // signal the end
+                Write(0);


Is there no case in which you would want to write 0 other than in indicating the end? I was thinking like if you have an item with Include="...\0..."

No, since strings are interned, the item text is not written here. Also we write how many strings we expect for each item type, and we read that many. So at this position in the stream only two things are possible: either the string that indicates the item type of the next group (non-zero) or 0 which indicates the end.

Forgind · 2021-03-29T22:10:42Z

src/Build/Logging/SerialConsoleLogger.cs

-                ProjectEvaluationStartedEventArgs projectEvaluationStarted = e as ProjectEvaluationStartedEventArgs;
-
-                if (projectEvaluationStarted != null)
+                if (showPerfSummary)


Why did you swap the nesting here?

Because I only wanted to test for types once, and make showPerfSummary check secondary. showPerfSummary is no longer the only important thing in this method, so I wanted to downgrade that

src/Build/Utilities/Utilities.cs

src/Shared/LogMessagePacketBase.cs

Co-authored-by: Forgind <Forgind@users.noreply.github.com>

This reverts commit ad644b2.

KirillOsenkov · 2021-03-30T16:19:09Z

I had to rollback a change to use Any() because LinkedList doesn't have Any(). I had my reservations because Any() is a method call that does allocations etc. vs. Count == 0 is a simple check.

KirillOsenkov · 2021-03-30T16:33:13Z

And yes, protected internal means protected OR internal (union). So everyone in the same assembly + all assemblies that this assembly has InternalsVisibleTo to + all derived types.

C# recently introduced private protected to mean protected AND internal (intersection). Only types in the same assembly + IVT which derive from this one.

KirillOsenkov · 2021-03-31T20:49:43Z

How do I trigger RPS for this? I have an exp branch here: https://github.com/dotnet/msbuild/tree/exp/kirillo/logEval

Forgind · 2021-03-31T21:23:38Z

Happened automatically—here's your PR. Looks good!

KirillOsenkov · 2021-03-31T21:48:00Z

Zomg now all I want is the coveted Merge when branch open sticker

KirillOsenkov requested review from cdmihai, panopticoncentral and mmitche March 20, 2021 20:50

benvillalobos reviewed Mar 23, 2021

View reviewed changes

KirillOsenkov added 2 commits March 27, 2021 16:29

Add Debug.Fail to catch unexpected cases

0a517ea

Fix name.

ffb63ca

Forgind approved these changes Mar 30, 2021

View reviewed changes

KirillOsenkov and others added 3 commits March 30, 2021 09:05

Update src/Build/Collections/ItemDictionary.cs

ad644b2

Co-authored-by: Forgind <Forgind@users.noreply.github.com>

Update src/Build/Utilities/Utilities.cs

f953c0d

Co-authored-by: Forgind <Forgind@users.noreply.github.com>

Revert "Update src/Build/Collections/ItemDictionary.cs"

87d1d51

This reverts commit ad644b2.

Use switch expression.

b7f5607

cdmihai approved these changes Mar 30, 2021

View reviewed changes

Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Mar 31, 2021

Forgind merged commit 84ad007 into main Apr 2, 2021

KirillOsenkov deleted the dev/kirillo/logEval branch April 2, 2021 18:05

KirillOsenkov mentioned this pull request Apr 13, 2021

ProjectStarted.GlobalProperties is expected to be not-null #6341

Closed

KirillOsenkov mentioned this pull request May 2, 2021

Sometimes does not show properties under projects KirillOsenkov/MSBuildStructuredLog#488

Closed

rainersigwald mentioned this pull request May 28, 2021

Azure Devops pipelines crashing in MSBuild logger, as of 5/25 VS2019 image #6498

Closed

rainersigwald mentioned this pull request Jun 7, 2021

Don't move Properties and Items to ProjectEvaluationFinished if legacy loggers present #6520

Merged

panopticoncentral mentioned this pull request Jun 19, 2021

Fix design-time build name and viewer dotnet/project-system-tools#402

Merged

KirillOsenkov mentioned this pull request Aug 31, 2021

MSBuild eval items only appear on first of multi-targeting inner-builds KirillOsenkov/MSBuildStructuredLog#531

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log properties and items on ProjectEvaluationFinished #6287

Log properties and items on ProjectEvaluationFinished #6287

KirillOsenkov commented Mar 20, 2021

KirillOsenkov commented Mar 20, 2021

KirillOsenkov commented Mar 22, 2021

benvillalobos left a comment

benvillalobos Mar 23, 2021

KirillOsenkov Mar 23, 2021

Forgind Mar 29, 2021

KirillOsenkov commented Mar 23, 2021

cdmihai commented Mar 23, 2021 •

edited

KirillOsenkov commented Mar 23, 2021

KirillOsenkov commented Mar 24, 2021

cdmihai commented Mar 26, 2021

KirillOsenkov commented Mar 26, 2021 •

edited

Forgind left a comment

Forgind Mar 29, 2021

Forgind Mar 29, 2021

KirillOsenkov Mar 30, 2021

Forgind Mar 30, 2021

Forgind Mar 29, 2021

KirillOsenkov Mar 30, 2021

Forgind Mar 29, 2021

KirillOsenkov Mar 30, 2021

Forgind Mar 29, 2021

KirillOsenkov Mar 30, 2021

Forgind Mar 29, 2021

KirillOsenkov Mar 30, 2021

KirillOsenkov commented Mar 30, 2021

KirillOsenkov commented Mar 30, 2021

KirillOsenkov commented Mar 31, 2021

Forgind commented Mar 31, 2021

KirillOsenkov commented Mar 31, 2021

	get => _includeEvaluationPropertiesAndItems ??= _eventSinkDictionary.Values.OfType<EventSourceSink>().Any(sink => sink.IncludeEvaluationPropertiesAndItems);
	get => _includeEvaluationPropertiesAndItems ??= _eventSinkDictionary.Values.Any(sink => sink is EventSourceSink esSink && esSink.IncludeEvaluationPropertiesAndItems);

Log properties and items on ProjectEvaluationFinished #6287

Log properties and items on ProjectEvaluationFinished #6287

Conversation

KirillOsenkov commented Mar 20, 2021

KirillOsenkov commented Mar 20, 2021

KirillOsenkov commented Mar 22, 2021

benvillalobos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillOsenkov commented Mar 23, 2021

cdmihai commented Mar 23, 2021 • edited

KirillOsenkov commented Mar 23, 2021

KirillOsenkov commented Mar 24, 2021

cdmihai commented Mar 26, 2021

KirillOsenkov commented Mar 26, 2021 • edited

Forgind left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillOsenkov commented Mar 30, 2021

KirillOsenkov commented Mar 30, 2021

KirillOsenkov commented Mar 31, 2021

Forgind commented Mar 31, 2021

KirillOsenkov commented Mar 31, 2021

cdmihai commented Mar 23, 2021 •

edited

KirillOsenkov commented Mar 26, 2021 •

edited