Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ReuseableStringBuilderFactory #2697

Closed
Tracked by #6940
davkean opened this issue Nov 3, 2017 · 6 comments · Fixed by #7093
Closed
Tracked by #6940

Improve ReuseableStringBuilderFactory #2697

davkean opened this issue Nov 3, 2017 · 6 comments · Fixed by #7093
Assignees
Labels
performance Performance-Scenario-General This issue affects performance in general. Priority:1 Work that is critical for the release, but we could probably ship without Priority:2 Work that is important, but not critical for the release size:3 triaged
Milestone

Comments

@davkean
Copy link
Member

davkean commented Nov 3, 2017

There's a bunch of improvements that I've noticed will help a lot:

@AndyGerlicher AndyGerlicher added this to the MSBuild 15.6 milestone Nov 6, 2017
@danmoseley
Copy link
Member

danmoseley commented Dec 8, 2017

There is already some diagnostic logging in there in debug -- improving that and running some scenarios would be a great way to tune it to first order. Analogous to how the OpportunisticIntern class can dump stats to help adjust its thresholds. So much easier than iterating with a profiler right away.

In particular it would be interesting to log how large (both length and capacity) the builders are when they are returned (which indicates how large they needed to be, and how often they are discarded). My guess is that they are often getting discarded as too large (>1024 chars). If the threshold was higher, the default size wouldn't matter much as they would rarely be recreated. It could even be tuned, ie., if there's a lot of discards, then increase the size you accept; if not many, then there's likely less value in keeping it around.

@rainersigwald rainersigwald removed this from the MSBuild 15.6 milestone Jun 7, 2019
@panopticoncentral panopticoncentral added the Performance-Scenario-General This issue affects performance in general. label Mar 31, 2020
@panopticoncentral panopticoncentral added the Priority:2 Work that is important, but not critical for the release label Mar 23, 2021
@ladipro ladipro added the size:1 label Oct 12, 2021
@ladipro
Copy link
Member

ladipro commented Oct 12, 2021

Labeling with size:1 as the cost of the initial investigation. Will open a follow-up issue if more work is identified.
Can we deprecate ReuseableStringBuilder in favor of SpanBasedStringBuilder?

@ladipro ladipro added size:3 and removed size:1 labels Nov 8, 2021
@rokonec rokonec moved this from To Do to In Progress in Perf sprint Nov 8th - Nov 21st 2021 Nov 14, 2021
@rokonec rokonec added the Priority:1 Work that is critical for the release, but we could probably ship without label Nov 14, 2021
@ladipro ladipro removed this from In Progress in Perf sprint Nov 8th - Nov 21st 2021 Nov 22, 2021
@rokonec
Copy link
Contributor

rokonec commented Nov 28, 2021

MSBuild currently uses two string builder reusing caches:

  1. ReuseableStringBuilder/Factory - it is static process wide cache of one StringBuilder wrapper with think façade of ReuseableStringBuilder. When returning SB capacity exceed max limit of 1024 chars, returning SB is abandoned and eventually collected. If capacity is bellow that, SB is Cleared in preparation for next reuse and rooted by stored into static variable. Although ReuseableStringBuilderFactory is designed to thread safe, using it heavily concurrently by multiple threads would invalidate its purpose.
  2. StringBuilderCache.Acquire/Release - this is clone of internal StringBuilderCache used at many places in runtime. It holds one SB per thread and its returning capacity limit is 360 chars.

I have instrumented code and captured returning length of both in various scenarios.
It turned out SB are mostly used for logging purposes.

In default verbosity both SB behaves acceptably although I'd recommend to increase size limit of StringBuilderCache to 1024 to get into 66% percentile of data reuse. Also MSBuild does not need to be as memory careful as Runtime.

Bellow is "histogram" of ReuseableStringBuilder returning length. Y axis is sum of returning length for given bucket.
image

With diagnostics logging verbosity, the situation for ReuseableStringBuilder is very different and much less optimal.

Only 47% percent of SB is reused. In other words comparing to create SB instance per call we safe only 47% of SB allocations. If we talk about allocated bytes we safes only 1.2% of allocated bytes.

Histogram looks like this:
image

Conclusion

It is worth to devise a better way to invalidate/reuse ReuseableStringBuilder. The solution should take into consideration different usage patterns during normal vs diagnostics.

@danmoseley
Copy link
Member

Would it help if StringBuilder could get (and return) its backing arrays through an array pool? This would just move the caching down of course but it could be tuned to avoid any that go on LOH. It might be easier to reason about the caching strategy.
It would have to be a copy of the existing SB for now since SB is not IDisposable unfortunately. If it was generally valuable (which is not clear eg it might well make it slower) one could imagine changing SB proper eg by having Trim() return to the pool.

@rokonec
Copy link
Contributor

rokonec commented Nov 28, 2021

Having SB which is backed up by array pool would be ideal, IMHO. I asked Stephen that question about half a year ago.
InterpolatedStringHandler of C# 10 uses array pool, but having ArrayPoolBackedStringBuilder which would have most of its API identical to StringBuilder would make it easy for people to migrate to it in memory heavy applications.
Although I already have some solution for our ReuseableStringBuilder, PR in day or two, having one well tuned shared ArrayPool looks appealing to me. Another interesting idea is to use @ladipro SpanBasedStringBuilder

/// <summary>
/// A StringBuilder replacement that keeps a list of <see cref="ReadOnlyMemory{T}"/> spans making up the intermediate string rather
/// than a copy of its characters. This has positive impact on both memory (no need to allocate space for the intermediate string)
/// and time (no need to copy characters to the intermediate string).
/// </summary>
/// <remarks>
/// The <see cref="ToString"/> method tries to intern the resulting string without even allocating it if it's already interned.
/// Use <see cref="Strings.GetSpanBasedStringBuilder"/> to take advantage of pooling to eliminate allocation overhead of this class.
/// </remarks>
public class SpanBasedStringBuilder : IDisposable

As this is even better, in certain uses cases, than theoretical ArrayPoolBackedStringBuilder. SpanBasedStringBuilder has one less arraycopy than ArrayPoolBackedStringBuilder which has to arraycopy from source to buffer and then from buffer to final string char[len] while SpanBasedStringBuilder just arraycopy from spans to final string char[len]. It might make difference with some huge strings. SpanBasedStringBuilder have some missing API (AppendFormat, Append(char), ...) to be used in some niche use cases though.

@danmoseley
Copy link
Member

cc @stephentoub in case he is interested in this discussion.

I am mainly interested in case anything is learned that would be of general interest in the core libraries. (I doubt SpanBasedStringBuilder would be)

it could be tuned to avoid any that go on LOH

Never mind that part. I remembered that SB already has a max chunk size of 8KB, unless you initialize it with something smaller. So it would be necessary to establish that backing with an ArrayPool actually is helpful or is merely moving the caching around.

@ladipro ladipro moved this from In Progress to Done in Perf sprint Nov 22nd - Dec 5th 2021 Dec 3, 2021
@ladipro ladipro added this to the VS 17.1 milestone Dec 8, 2021
rainersigwald added a commit to rainersigwald/msbuild that referenced this issue Jan 18, 2022
While I was editing in this area I noticed that the calling pattern
around StringBuilders in FormatEventMessage looked allocatey.

Instead of creating two throwaway StringBuilders to format a single
message,

1. Grab a ReusableStringBuilder
2. Reuse the builder between "construct format string" and "get final
   message".

We chose ReusableStringBuilder over StringBuilderCache because logging
sometimes creates strings that are _much_ larger than the 512 character
limit of SBC. That also reduces the need to prereserve a size: the
process-wide pool's elements should be pretty big already.

See dotnet#2697 (comment)
for stats on string length.
@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-Scenario-General This issue affects performance in general. Priority:1 Work that is critical for the release, but we could probably ship without Priority:2 Work that is important, but not critical for the release size:3 triaged
Development

Successfully merging a pull request may close this issue.

8 participants