New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build perf is negatively impacted #114
Comments
Consider caching the results across projects using this technique. |
Partial performance improvement for #114
What is the current state of this? (Haven’t done any performance analysis on our build yet but I am interested in “current” statistics + insights in if anything still needs to be done here. |
I've spent some time looking. The problem with caching across tasks is that each expensive result is necessarily keyed by the path to the project directory. Since each project has a different project directory, the result could be different and thus needs to be recalculated. That leaves us with two options to improve perf:
|
It looks like there are actually 2 performance areas: one related to GetBuildVersion and one with AssemblyVersionInfo. If I understand correctly, the assembly versioninfofile is generated each time and only copied when it is different. |
Note that would be useful only to incremental builds of course. |
I have a branch where I have added 2 additional AssemblyInfoTests: So far it seems (probably expected) the CodeDomGenerator codepath in AssemblyInfo is "very" expensive. CSharpCodeDomGenerator (net461) ~160ms I have not looked into it too much, but is the CodeDomGenerator code path still needed feature wise? |
It was historically there because it meant NB.GV could support any .NET language for which a CodeDomProvider exists. Then we added support for a language or two that didn't have a CodeDomProvider, oh and .NET Core didn't offer them, so we had backup code paths for when they weren't available. Would you care to send a PR, @japj ? |
CodeDOM is *very* slow, it turns out. Addresses on eof the perf issues reported in #114
Please check out v2.3.125 which has the code DOM provider replaced with our homemade code generator to address the bulk of the perf problem. Let me know how much of a problem remains. :) |
@AArnott Thanks for the great work, I got stuck in other stuff at work so did not yet have any time to work on it myself. Will definitely try out soon to verify impact |
@AArnott Any chance we can get your initial suggestion to "opt-in" all projects sharing the same result? The GetBuildVersion task is still a major build performance bottleneck for (37 projects, most of them also generating a nuget): GetBuildVersion is invoked 65 times taking ~400ms per invocation. Or would there be a solution/project setup where one gets this kind of caching? |
Just to add some observations to this issue, we are also seeing a large perf hit due to the
This work adds up over thousands of commits and was taking up a large proportion of runtime. I have trialed a fix in a fork where I cache the |
Thanks for your feedback and efforts, @djluck. MSBuild implicitly caches each target's execution, so Once you get each project to only execute the Sharing a cache between projects requires care because each project has a different path and thus may get different results in traversing the directory tree upward looking for version.json. Also a large set of projects (should) build with multiple processes, so even if we did share some data via a static variable such that it could avoid redundant work between projects, it would still only assist the projects that build in that particular msbuild process. The change I'm planning for the largest repos, where we can assume there is just one version.json file at the root, is to run a |
Thanks for the hint, I'll look into this using the tool you linked. My MsBuild knowledge is fairly incomplete but I wondered if properties like
Unfortunately this wouldn't help us (sadface)- we do have a |
Then I think the best optimization that can apply for you would be to ensure the target only executes once per project as I mentioned. |
I just realized that in VisualStudio the GenerateAssemblyVersionInfo might actually run twice (once during design time build and once during a regular build). Maybe that is also something worthwhile to look into? As for an incremental version of GetBuildVersion. Would it be enough to look at time stamps of version.json and current git commit (for inputs)? |
@AArnott thank you for the tip on using MSbuild structured log viewer- it makes understanding what MSBuild is doing so much easier. So, coming back to this point:
I found out that
So I was certainly wrong in stating that Also more thoughts on this comment:
I think perhaps configurable behaviour for how |
I'd be interested on your thoughts on this idea around caching the git height: As the majority of time during the
When |
@japj said:
Running in design-time is important so the language service realizes that the
That seems like a reasonable optimization. I guess we'd cache the version.json timestamp and commit in an intermediate file so we can read it later and compare with current values, right? I'd accept a PR for that if you're interested in preparing it. |
@djluck Good investigation on multiple executions of the GetBuildVersion target and the global properties to explain it. And good idea about caching the git height in a way that still works even after HEAD moves on. I've collected all these ideas as a checklist in the issue description. I think we have enough good ideas to move forward on some very impactful improvements. What can I assign out to people? Are folks interested in taking any of these on? |
I'd definitely be interested in the caching of the height of commits, although I don't think I'll be able to put out a PR for at least a week or so. |
Great. Maybe write up a proposal for where the file would be, how many commits you would cache and how, Etc. |
Just to weight in a little bit. We are suffering with this pretty heavily on our incremental builds. We have around 35 projects in a solution that all use the same shared version.js file in the repository root. For each project the calculation of the version number takes something like 2 to 4 seconds (SSD, current gen AMD hardware). Even if the project builds run in parallel this adds up really quickly and makes the builds take several minutes. MSBuild log analyzer shows that the Out of the ideas listed in the top post these ones would benefit us the most:
There are some other fundamental Git concepts that could help with speeding up the actual height calculation. Notably Git now has a concept of commit graph files and, more recently, bloom filters for quick checking of changed paths on per-commit basis. This allows really efficient history lookup for the purpose of calculating distance between two commits. In the commit graph files it is even expressed as generation number and thus the lookup can be done in constant time (with no path filter), or with very little I/O (with path filter and using the bloom filters to avoid accessing object storage). I doubt that Libgit2 supports it at the moment and it could be well outside the scope of this project but it's not difficult to implement support for reading these commit graph files in C# if it proves to be beneficial. I've written an implementation in Go before and it was pretty straightforward. For anyone wanting to look into real-world data I am willing to privately share the MSBuild binary logs from our project. |
@filipnavara That's very interesting. I'd never heard of the commit graph files. I'll have to learn more about them. |
I believe the git commit graph was originally developed my people from Microsoft, but it’ll depend on having a supported client to generate the right files. See also for more background at https://devblogs.microsoft.com/devops/supercharging-the-git-commit-graph/ |
@japj Correct, I can only recommend reading the blog post series, it gives quite an insight about the design. Derrick moved to the GitHub team now and the bloom filter implementation was finished by someone else (although the theory in the last blog post of the series still holds for the official implementation). The commit graph support itself is in the official GIT client for few versions now and it can be setup to automatically update the commit graph file on various operations. It may require some initial setup but the potential gain in performance is often worth it. Things like commit height, or calculating It is an optimization that is orthogonal to any caching employed by GitVersioning but it's something that may be worth exploring for certain types of scenarios where simple caching is not doing the job. @AArnott Thanks for looking into it! |
@AArnott I spent a bit of time on Friday trying to implement the caching of git height + commit id tuples. My approach was to add caching behaviour into |
I may not understand what you tried. But caching the height keyed on the commit ID and project directory should always be safe because the height of a given commit within a given directory can never change as git history is immutable, right? |
I think this was probably my mistake. I'll take a look down this avenue. |
OK, I'll hold off on any caching work for a while or till I hear back from you, @djluck so we don't solve the same problems or create bad merge conflicts. |
I'm preparing a change that should bring the number of |
As you can see on the checklist in the issue description, some substantial improvements have been made today. Most repos can get down to just one There is still value in the caching idea that @djluck has piloted in #499, but we should probably focus it on improving perf in the default scenario (not requiring any opt in or behavior changes) to maximize the benefit to the remainder of repos that don't want to take any opt-in effort to a faster approach. |
I tried version 3.3.22-alpha with the added |
We have made many fixes in this area over the last few releases, including the 3.4-beta. |
Bumps [Microsoft.NET.Test.Sdk](https://github.com/microsoft/vstest) from 16.10.0 to 16.11.0. - [Release notes](https://github.com/microsoft/vstest/releases) - [Commits](microsoft/vstest@v16.10.0...v16.11.0) --- updated-dependencies: - dependency-name: Microsoft.NET.Test.Sdk dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Running PerfView during the build of the PInvoke project, I see that almost a full second goes to the GetBuildVersion task, which seems to be doing some repeat work perhaps within the same task execution, and certainly across projects. Can we cache the results across projects?
Opportunities:
GetBuildVersion
target where the results are saved to a file with the cache key being the tuple: version.json timestamp (and path?) and commit id. (@japj's idea)GetBuildVersion
MSBuild task only once per project (or per repo) #508 Cut down on redundantGetBuildVersion
executions per-project that @djluck identified, possibly by using theMSBuild
task call to invoke another target but with a reduced set of global properties to improve target skipping around P2P and TargetFramework invocations.GetBuildVersion
MSBuild task only once per project (or per repo) #508 Build on the shared MSBuild target invocation within a single project to share it across projects based on a project property that specifies the directory path for which version.json will be searched for and used as the 'cache key' for reuse. This key can be a global property in the MSBuild task invocation to automatically cache at the right level, and this same property can serve as the input for the version.json search.The text was updated successfully, but these errors were encountered: