Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET8 Crash on use HttpClientHandler.Credentials #97966

Open
dumkin opened this issue Feb 5, 2024 · 32 comments
Open

.NET8 Crash on use HttpClientHandler.Credentials #97966

dumkin opened this issue Feb 5, 2024 · 32 comments
Labels
area-System.Net.Security os-mac-os-x macOS aka OSX tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@dumkin
Copy link

dumkin commented Feb 5, 2024

Description

Our company uses Microsoft Exchange email, to communicate with it we have a separate service that is authenticated using HttpClientHandler.Credentials. After updating .NET to version 8, I received application crashes on any request. After finding out the reasons, I was able to get to the minimal environment that repeats this error. The application almost always crashes. And it seems this only happens on MacOS runtime.

Error:

dotnet(15122,0x16c263000) malloc: *** error for object 0x600000b518c0: pointer being freed was not allocated
dotnet(15122,0x16c263000) malloc: *** set a breakpoint in malloc_error_break to debug

Reproduction Steps

I have prepared a repository that allows you to reproduce the error

https://github.com/dumkin/net8-credentials-crash

Also you may run code

var builder = WebApplication.CreateBuilder(args);
builder.WebHost.UseUrls("http://localhost:2000");
var app = builder.Build();

app.MapGet("/", context =>
{
    context.Response.Headers.Append("X-WSSecurity-For", "None");
    context.Response.Headers.Append("WWW-Authenticate", "Negotiate");
    context.Response.StatusCode = 401;
    return Task.CompletedTask;
});

var cts = new CancellationTokenSource();
cts.CancelAfter(TimeSpan.FromSeconds(10));
await Task.Factory.StartNew(async () => await app.RunAsync(cts.Token));

var client = new HttpClient(new HttpClientHandler
{
    ServerCertificateCustomValidationCallback = (_, _, _, _) => true,
    Credentials = new NetworkCredential("test", "test"),
});
client.BaseAddress = new Uri("http://127.0.0.1:2000");
await client.GetStringAsync("");

Expected behavior

HttpClient auth and worked correctly

Actual behavior

Application crashed

Regression?

Yes, it worked on .NET 7

Known Workarounds

No

Configuration

.NET SDK:
 Version:           8.0.101
 Commit:            6eceda187b
 Workload version:  8.0.100-manifests.69afb982

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  14.2
 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /usr/local/share/dotnet/sdk/8.0.101/

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 5, 2024
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 5, 2024
@dumkin
Copy link
Author

dumkin commented Feb 5, 2024

was tested on x86 and Arm MacOS, everywhere it crash when using .net 8

@dumkin
Copy link
Author

dumkin commented Feb 5, 2024

Please take a look, thank you!
@dotnet/ncl , @kotlarmilos

@vcsjones
Copy link
Member

vcsjones commented Feb 5, 2024

I can reproduce this against net8.0, but not main.

Native stack:

frame #0: 0x000000018c3020dc libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018c339cc0 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000018c245a40 libsystem_c.dylib`abort + 180
    frame #3: 0x000000018c15cb08 libsystem_malloc.dylib`malloc_vreport + 908
    frame #4: 0x000000018c17c24c libsystem_malloc.dylib`malloc_zone_error + 104
    frame #5: 0x000000018c17b0a8 libsystem_malloc.dylib`free_tiny_botch + 40
    frame #6: 0x000000019cc33b1c GSS`_gss_scram_release_cred + 176
    frame #7: 0x000000019cbf0dec GSS`_gss_mg_release_cred + 124
    frame #8: 0x000000018c4e9b90 CoreFoundation`_CFRelease + 292
    frame #9: 0x000000019cbf0d48 GSS`gss_release_cred + 76

The managed stack

0000000171005C10 0000000105b0cd28 (MethodDesc 0000000105f3cb50 + 0x98 Microsoft.Win32.SafeHandles.SafeGssCredHandle.ReleaseHandle())
0000000171005CB0 051880019cbf0d48 051880019cbf0d48, calling 051880019cc37d00
0000000171005CE0 703a000105b0cd28 703a000105b0cd28
0000000171005CF0 0000000101f2bfc4 (MethodDesc 0000000102ba3038 + 0x114 System.Runtime.InteropServices.SafeHandle.InternalRelease(Boolean))
0000000171005D30 0000000105b0cd0c (MethodDesc 0000000105f3cb50 + 0x7c Microsoft.Win32.SafeHandles.SafeGssCredHandle.ReleaseHandle())
0000000171005DE0 0000000101f2bdbc (MethodDesc 0000000102ba2fc0 + 0x2c System.Runtime.InteropServices.SafeHandle.Dispose())
0000000171005E20 0000000105b103f8 (MethodDesc 0000000105f3b1e8 + 0x28 System.Net.NegotiateAuthenticationPal+UnixNegotiateAuthenticationPal.Dispose())
0000000171005E40 0000000105b1069c (MethodDesc 0000000105f3b1f8 + 0x25c System.Net.NegotiateAuthenticationPal+UnixNegotiateAuthenticationPal.GetOutgoingBlob(System.ReadOnlySpan`1<Byte>, System.Net.Security.NegotiateAuthenticationStatusCode ByRef))
0000000171005E80 0000000105b1bf4c (MethodDesc 0000000105cfac30 + 0x3c System.Net.Security.NegotiateAuthentication.GetOutgoingBlob(System.ReadOnlySpan`1<Byte>, System.Net.Security.NegotiateAuthenticationStatusCode ByRef))
0000000171005ED0 0000000105b1c0e4 (MethodDesc 0000000105cfac48 + 0x84 System.Net.Security.NegotiateAuthentication.GetOutgoingBlob(System.String, System.Net.Security.NegotiateAuthenticationStatusCode ByRef))

@vcsjones vcsjones added area-System.Net.Security and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 5, 2024
@ghost
Copy link

ghost commented Feb 5, 2024

Tagging subscribers to this area: @dotnet/ncl, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Our company uses Microsoft Exchange email, to communicate with it we have a separate service that is authenticated using HttpClientHandler.Credentials. After updating .NET to version 8, I received application crashes on any request. After finding out the reasons, I was able to get to the minimal environment that repeats this error. The application almost always crashes. And it seems this only happens on MacOS runtime.

Error:

dotnet(15122,0x16c263000) malloc: *** error for object 0x600000b518c0: pointer being freed was not allocated
dotnet(15122,0x16c263000) malloc: *** set a breakpoint in malloc_error_break to debug

Reproduction Steps

I have prepared a repository that allows you to reproduce the error

https://github.com/dumkin/net8-credentials-crash

Also you may run code

var builder = WebApplication.CreateBuilder(args);
builder.WebHost.UseUrls("http://localhost:2000");
var app = builder.Build();

app.MapGet("/", context =>
{
    context.Response.Headers.Append("X-WSSecurity-For", "None");
    context.Response.Headers.Append("WWW-Authenticate", "Negotiate");
    context.Response.StatusCode = 401;
    return Task.CompletedTask;
});

var cts = new CancellationTokenSource();
cts.CancelAfter(TimeSpan.FromSeconds(10));
await Task.Factory.StartNew(async () => await app.RunAsync(cts.Token));

var client = new HttpClient(new HttpClientHandler
{
    ServerCertificateCustomValidationCallback = (_, _, _, _) => true,
    Credentials = new NetworkCredential("test", "test"),
});
client.BaseAddress = new Uri("http://127.0.0.1:2000");
await client.GetStringAsync("");

Expected behavior

HttpClient auth and worked correctly

Actual behavior

Application crashed

Regression?

Yes, it worked on .NET 7

Known Workarounds

No

Configuration

.NET SDK:
 Version:           8.0.101
 Commit:            6eceda187b
 Workload version:  8.0.100-manifests.69afb982

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  14.2
 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /usr/local/share/dotnet/sdk/8.0.101/

Other information

No response

Author: dumkin
Assignees: -
Labels:

area-System.Net.Security, untriaged, needs-area-label

Milestone: -

@vcsjones
Copy link
Member

vcsjones commented Feb 5, 2024

/cc @filipnavara in case something jumps out at you.

@filipnavara
Copy link
Member

I can reproduce this against net8.0, but not main.

main switched the implementation of NegotiateAuthentication from full GSSAPI to managed SPNEGO + managed NTLM + Kerberos through GSSAPI for all Apple platforms. That radically reduces the surface where we depend on the system's GSSAPI implementation. While this particular memory corruption is not something I recognize I have a number of unresolved Apple feedback items for different buffer overruns in the GSSAPI implementation. Obviously I cannot rule out an API misuse on our side.

On .NET 8 it's possible to opt-in to the managed NTLM/SPNEGO implementation by adding this property to the .csproj file: <RuntimeHostConfigurationOption Include="System.Net.Security.UseManagedNtlm" Value="true" />.

@wfurt
Copy link
Member

wfurt commented Feb 8, 2024

did the repro work for you @filipnavara? I could not reproduce it so far on my (Intel) MacBook. I simply get 401 back.

@wfurt
Copy link
Member

wfurt commented Feb 8, 2024

And perhaps if you do have repro @vcsjones, could you try it with 6.0 or 7.0? I'm wondering if this is 8.0 regression or of the issue always existed.

@dumkin
Copy link
Author

dumkin commented Feb 8, 2024

@wfurt
hello, have you tried running it several times? I wrote in the readme repo that the bug seems to be not stable, it almost always ends in an error, but sometimes it really just returns 401. And on net 7 it works stably, I checked it specifically so this is a regression. in my company the bug was reproduced by at least 3 people and there were both arm and intel processors

@wfurt
Copy link
Member

wfurt commented Feb 8, 2024

yes, I did run it several times @dumkin. I can let it sit in loop .... and I can possibly also get hands on arm Mac.

@vcsjones
Copy link
Member

vcsjones commented Feb 8, 2024

And perhaps if you do have repro @vcsjones, could you try it with 6.0 or 7.0? I'm wondering if this is 8.0 regression or of the issue always existed.

I have similar results as @dumkin.

  • .NET 9 nightly:
    • No value for System.Net.Security.UseManagedNtlm: No crashes in 10 runs
    • false for System.Net.Security.UseManagedNtlm: Crashes consistently under LLDB
  • .NET 8: crashes consistently under LLDB
  • .NET 7: No crashes in 10 runs
  • .NET 6: No crashes in 10 runs

Since it may be sensitive to macOS environment:

❯ sw_vers
ProductName: macOS
ProductVersion: 14.3
BuildVersion: 23D56

This is on an M1.

I'm wondering if this is 8.0 regression

It seems that way. main "fixes" the issue by using the managed implementation, but if you turn it off, the issue comes back.

@filipnavara
Copy link
Member

filipnavara commented Feb 8, 2024

@vcsjones Thanks for the summary. I will investigate tomorrow morning.

Update: I can reproduce this.

@filipnavara
Copy link
Member

filipnavara commented Feb 9, 2024

This is starting to ring some bells. macOS implementation of gss_init_sec_context has the habbit of destroying the context handle on error and replacing it with an invalid one. One must absolutely not touch the old handle afterwards. I originally fixed this in #71484 but apparently it regressed during the rewrite somewhere here:

if (contextHandle.IsInvalid)
{
targetNameHandle?.Dispose();
}

FWIW Apple updated the code last September and added more of the early frees here - apple-oss-distributions/Heimdal@07a3113#diff-c60d25db88cb547db736ca8f6a23712a4ce2bf2be392eb265878d5b7b2f020f7 - which could explain why it's only happening on some versions of macOS.

@filipnavara
Copy link
Member

filipnavara commented Feb 9, 2024

We have a new Apple bug here. Yay!

In the call to gss_init_sec_context we pass our cred_handle to GSSAPI. Since we are doing SPNEGO protocol it's distributed to the underlying mechanisms, one of which happens to be the "DIGEST" one. The implementation of _gss_scram_init_sec_context proceeds to save the cred_handle without making a copy:

https://github.com/apple-oss-distributions/Heimdal/blob/48f86d0ceef220f75b16f0fc8266b53d50129c38/lib/gssapi/digest/init_sec_context.c#L324
https://github.com/apple-oss-distributions/Heimdal/blob/48f86d0ceef220f75b16f0fc8266b53d50129c38/lib/gssapi/digest/init_sec_context.c#L344

When the authentication inevitably fails, the gss_init_sec_context API goes on to delete the whole security context:

  * frame #0: 0x0000000192090a94 GSS`_gss_scram_release_cred
    frame #1: 0x000000019208f1c0 GSS`_gss_scram_delete_sec_context + 124
    frame #2: 0x000000019204e5a0 GSS`gss_delete_sec_context + 256
    frame #3: 0x0000000192055448 GSS`_gss_spnego_internal_delete_sec_context + 308
    frame #4: 0x000000019204f0c8 GSS`_gss_spnego_init_sec_context + 488
    frame #5: 0x0000000192047ca8 GSS`gss_init_sec_context + 1236

and _gss_scram_delete_sec_context deletes the cred_handle that is in our ownership:

https://github.com/apple-oss-distributions/Heimdal/blob/48f86d0ceef220f75b16f0fc8266b53d50129c38/lib/gssapi/digest/delete_sec_context.c#L50-L51

When we later call gss_release_cred, like a good citizen, it inevitably crashes with double free. This bug was introduced with Heimdal-682 (committed to the OSS repository around 5 months ago).

Submitted to Apple as FB13600619

@filipnavara
Copy link
Member

filipnavara commented Feb 9, 2024

Finally, to answer why it doesn't crash on .NET 6/7... It actually does - if you run it in a loop and force handles to be finalized with GC.Collect():

for (;;)
{
    try { await client.GetStringAsync(""); }
    catch (System.Net.Http.HttpRequestException) { GC.Collect(); }
}

We were just leaking the handles and depending on finalization instead of releasing them deterministically.

@Neustradamus
Copy link

To follow this ticket

@theissto
Copy link

I'm having the same problem when using the Microsoft.Exchange.WebServices.Data.EmailMessage.SendAndSaveCopy() Method on Mac OS Sonoma 14.2.1 and .NET 8.

@rzikm rzikm added the os-mac-os-x macOS aka OSX label Feb 20, 2024
@rzikm rzikm added this to the 9.0.0 milestone Feb 20, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Feb 20, 2024
@rzikm
Copy link
Member

rzikm commented Feb 20, 2024

Unless I am mistaken, it seems we cannot do anything about this bug from .NET and it needs to be fixed in the GSSAPI codebase.

@filipnavara am I right?, Also, is there a link on which the status of the issue you filed can be tracked?

@rzikm rzikm added the tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly label Feb 20, 2024
@rzikm rzikm modified the milestones: 9.0.0, Future Feb 20, 2024
@filipnavara
Copy link
Member

Unfortunately Apple Feedback is private. I will share any update as soon as I receive it but there has been no response from Apple so far.

I don’t think a workaround is possible on .NET side aside from the aforementioned UseManagedNtlm switch which bypasses the whole Apple SPNEGO implementation.

@wfurt
Copy link
Member

wfurt commented Feb 21, 2024

could we simply duplicate the credentials @filipnavara? Even if we play some weird games, could we prevent the crash?

@filipnavara
Copy link
Member

filipnavara commented Feb 21, 2024

could we simply duplicate the credentials @filipnavara?

You can leak the memory but you cannot reliably fix it. The SPNEGO doesn’t always end up in the Digest code path. That depends on how far you get with the authentication and possibly the negotiated algorithms. There’s no easy way to detect it and even if we somehow manage to detect it then it would start leaking once Apple fixes it.

@SteveSyfuhs
Copy link

Is there a way to tell their stack to skip including DIGEST in the negotiate mechs? That would certainly simplify the problem.

@filipnavara
Copy link
Member

Is there a way to tell their stack to skip including DIGEST in the negotiate mechs?

I checked the code (https://github.com/apple-oss-distributions/Heimdal/blob/48f86d0ceef220f75b16f0fc8266b53d50129c38/lib/gssapi/spnego/compat.c#L247-L351; callers and callees) and I didn't find any public API to do so.

@wfurt
Copy link
Member

wfurt commented Mar 19, 2024

Since there is no easy workaround and the crash is pretty nasty, I'm wondering if we should port the System.Net.Security.UseManagedNtlm to 8.0 .... and perhaps even flip the switch.
I was hesitant to take in late in 8.0 but we did not have crashes and this may be out long (LTS) without mitigation.

any thoughts on this @karelz ???

@JeroenBer
Copy link

This currently causes issues on macOS and iOS (#99892).

Please make UseManagedNtlm = True the default in 8.0 LTS.

It took us ages to trace this back to Ntlm credentials, most people might not even be able to trace this back since it causes random crashes.

@karelz
Copy link
Member

karelz commented Mar 26, 2024

Folks, if you are impacted by this issue, can you please add upvote on top post? It will be easier to track number of people impacted. Thanks!

@filipnavara
Copy link
Member

We have seen the crashes on .NET 7 and .NET 8 in our app. The problem is that it's not easily traceable to the root cause. I expect the number of the people to be affected to be high. Just my $.02.

@wfurt
Copy link
Member

wfurt commented Mar 26, 2024

How big would be the change to introduce the switch ion 8.0 @filipnavara? I assume not that big as we already do it for Android???? We discussed the possibility of servicing with @karelz. Changing the default may be difficult to push through. (but we can try)

@filipnavara
Copy link
Member

The runtime part of the switch is already there, we just miss the SDK MSBuild part (dotnet/sdk#34903). Possibly we would need to backport some of the Managed NTLM fixes in dotnet/runtime which were usually small and targeted.

@karelz
Copy link
Member

karelz commented Apr 2, 2024

@tobyperplex

Add upvote on top post.. Do you mean the post with the most upvotes (by wfurt) or the one on the top of the page (by dumkin)?

I meant original post at the very top by dumkin

@filipnavara
Copy link
Member

FWIW I tried to burn one of my paid support requests for code-level support. Today I received a reply that there's no workaround, the feedback is still under investigation, and I got the support request credited back to my account.

@t0byman
Copy link

t0byman commented May 13, 2024

Folks, if you are impacted by this issue, can you please add upvote on top post? It will be easier to track number of people impacted. Thanks!

We've reached 20 upvotes! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Net.Security os-mac-os-x macOS aka OSX tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

No branches or pull requests

12 participants