Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate IInternable in favor of ReadOnlySpan<char> for OpportunisticIntern inputs #5438

Closed
ladipro opened this issue Jun 17, 2020 · 1 comment · Fixed by #5663
Closed

Comments

@ladipro
Copy link
Member

ladipro commented Jun 17, 2020

Most IInternable implementations can already be represented as ReadOnlySpan<char> and are thus redundant. The StringBuilder based implementations are problematic from performance point of view in that they're indexing into the StringBuilder as iterating over its characters could be an O(N^2) operation.

We should:

  • Delete IInternable.
  • Convert the non-StringBuilder uses to ReadOnlySpan<char>.
  • Figure out how to address StringBuilder implementations.
    • On .NET Core we could use the new GetChunks() method.
    • Or we could simply convert the StringBuilder to string and verify that the perf win outweighs the cost of the string allocation.
@ladipro
Copy link
Member Author

ladipro commented Aug 14, 2020

Picking this one up. Here's my StringBuilder proposal:

Since most uses of StringBuilder in MSBuild are "append-only" and simply compose the resulting string from smaller (sub-)strings, I'll implement a new data structure tentatively named CharacterSpanBuilder. Instead of copying the string contents around it will be keeping track of just references to the constituent strings. When done building, CharacterSpanBuilder will trivially return a collection of ReadOnlySpan<char> to be consumed by the interner.

The ToString() method will be provided as well and the new data structure should replace most uses of StringBuilder in the code base, with positive impact on ephemeral allocations.

Before:

StringBuilder sb = new StringBuilder(256);
sb.Append(str1);
sb.Append(str2, 0, someLength);
return sb.ToString();

After:

CharacterSpanBuilder sb = new CharacterSpanBuilder(2);
sb.Append(str1);
sb.Append(str2, 0, someLength);
return sb.ToString();

Instead of capacity in characters, the caller is supplying the capacity of the internal array of { string, int, int } records, so typically orders of magnitude less bytes allocated.

CharacterSpanBuilder may even be a struct passed around by "ref" and satisfy a reasonable number of records without allocating anything at all.

@ladipro ladipro linked a pull request Aug 19, 2020 that will close this issue
2 tasks
@ladipro ladipro modified the milestones: Backlog, MSBuild 16.10 Dec 9, 2021
@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants