Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed
jaredpar opened this issue Apr 2, 2020 · 16 comments
Closed
Labels
area-Infrastructure blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' blocking-official-build
Milestone

Comments

@jaredpar
Copy link
Member

jaredpar commented Apr 2, 2020

Runfo Tracking Issue: OSX deprovision in runtime

Definition Build Kind Job Name
runtime 1288514 PR 52695 CoreCLR Product Build OSX arm64 release
runtime 1284973 PR 57116 Build iOSSimulator x86 Debug AllSubsets_Mono
runtime 1284614 PR 57076 Build MacCatalyst x64 Debug AllSubsets_Mono
runtime 1266365 PR 56406 Libraries Test Run release mono Linux x64 Debug
runtime 1266272 PR 56502 CoreCLR Product Build OSX x64 release
runtime 1266203 PR 56398 Mono Product Build OSX x64 release
runtime 1265838 PR 56537 CoreCLR Product Build OSX x64 checked
runtime 1265709 PR 56531 Mono crossaot Product Build OSX x64 release
runtime 1264483 PR 56431 Mono iOS AOT offsets
runtime 1264469 PR 56411 Build OSX x64 Release AllSubsets_Mono_LLVMJIT
runtime 1264447 PR 56490 Build iOSSimulator x86 Debug AllSubsets_Mono
runtime 1264431 PR 56486 Build tvOSSimulator x64 Debug AllSubsets_Mono
runtime 1264431 PR 56486 Build MacCatalyst arm64 Debug AllSubsets_Mono
runtime 1263889 PR 56471 CoreCLR Product Build OSX arm64 checked
runtime 1263861 PR 56411 Build MacCatalyst arm64 Debug AllSubsets_Mono
runtime 1263853 PR 56316 Build iOSSimulator x86 Debug AllSubsets_Mono
runtime 1262979 PR 56434 Installer Build and Test coreclr OSX_x64 Release
runtime 1246278 PR 55939 Build iOSSimulator x86 Debug AllSubsets_Mono
runtime 1246251 PR 55925 Mono crossaot Product Build OSX x64 release

Build Result Summary

Day Hit Count Week Hit Count Month Hit Count
0 2 18
@jaredpar jaredpar added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Apr 2, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 2, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

@jaredpar
Copy link
Member Author

jaredpar commented Apr 2, 2020

FYI this is not runtime specific. Hitting other repos as well

Definition Build Kind Timeline Record
runtime 585548 Rolling Mono Product Build iOS arm64 debug
runtime 585548 Rolling Mono LLVM Product Build OSX x64 debug
runtime 585497 PR #33701 Installer Build and Test OSX_x64 Debug
runtime 585541 PR #34229 Mono LLVM Product Build OSX x64 debug
coreclr 585166 Rolling CoreCLR Pri1 Test Run OSX x64 checked
MachineLearning-CI 584884 PR dotnet/machinelearning#4925 MacOS_x64_NetCoreApp21 Release_Build

Evaluated 500 builds
Impacted 5 bulids
Impacted 6 jobs

@dougbu
Copy link
Member

dougbu commented Apr 21, 2020

This is also tracked at dotnet/core-eng#9448. Correct?

@MattGal
Copy link
Member

MattGal commented Apr 21, 2020

This is also tracked at dotnet/core-eng#9448. Correct?

Correct.

@AraHaan
Copy link
Member

AraHaan commented May 19, 2020

aww that link 404d for me.

@MattGal
Copy link
Member

MattGal commented May 19, 2020

aww that link 404d for me.

Private repo; I can publicly tell you however that the problem continues to be investigated and seems to be at the MacOS-VM-provider level.

@AraHaan
Copy link
Member

AraHaan commented May 19, 2020

Seems on my pr 36694 that this might be a cause when I read this: https://helix.dot.net/api/2019-06-17/jobs/b0b1c468-c4c5-4f2d-a2a6-3c831ed6ef20/workitems/System.Net.Sockets.Tests/console

@MattGal
Copy link
Member

MattGal commented May 19, 2020

Seems on my pr 36694 that this might be a cause when I read this: https://helix.dot.net/api/2019-06-17/jobs/b0b1c468-c4c5-4f2d-a2a6-3c831ed6ef20/workitems/System.Net.Sockets.Tests/console

No that log is from a plain old crash on a Helix machine. This issue is specific to hosted Azure Devops build macs, which are in no way connected.

@CoffeeFlux
Copy link
Contributor

Because of the AzDO limitation where you can't rerun individual lanes until all of them are completed, this issue is particularly painful when trying to land a PR :(

@myblindy
Copy link

Any idea why my https://dev.azure.com/dnceng/public/_build/results?buildId=726102 build was included in this? It didn't fail any tests or build steps as far as I can tell.

@MattGal
Copy link
Member

MattGal commented Jul 13, 2020

Any idea why my https://dev.azure.com/dnceng/public/_build/results?buildId=726102 build was included in this? It didn't fail any tests or build steps as far as I can tell.

It does on Attempt #1 :
image

So, if you didn't click retry, some nice person did for you :)

@MattGal
Copy link
Member

MattGal commented Jul 13, 2020

Update from my end; chatted with AzDO folks in charge of the hosted macs today, the previous ETA of 7/15 to migrate the hosted agents seems to have not been accurate. I am politely asking for an updated time.

@directhex
Copy link
Member

@MattGal
Copy link
Member

MattGal commented Sep 11, 2020

Seemingly still a thing https://dev.azure.com/dnceng/internal/_build/results?buildId=810664&view=logs&j=04b1677b-aa91-548a-e789-89c3d998d0f7

Yes, definitely still a thing (and why the issue is still open). I checked this instance out, and it's definitely on the v2 pool so the migration is not something we can hope for to help us here any more. I did share this specific instance back with the MMS team though they are tracking them en masse as well via telemetry.

@ViktorHofer
Copy link
Member

@MattGal we are now live updating this issues top post. Maybe that helps?

@ViktorHofer
Copy link
Member

Got fixed by AzDO. Closing.

@dotnet dotnet locked as resolved and limited conversation to collaborators May 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' blocking-official-build
Projects
None yet
Development

No branches or pull requests

10 participants