Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extremely slow Network and Disk IO on Windows agent compared to Ubuntu/Mac #3577

Closed
3 of 8 tasks
jetersen opened this issue Jun 13, 2021 · 46 comments
Closed
3 of 8 tasks
Assignees
Labels
external investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows

Comments

@jetersen
Copy link

jetersen commented Jun 13, 2021

Description
Actually we are seeing the same behavior on GitHub actions running a shell command for dotnet restore takes a very long time on windows even when using actions/cache 😅

Originally posted by @jetersen in #1733 (comment)

I cannot replicate locally where a restore with a full nuget cache it takes less than a second.

Area for Triage:
.NET Core

Question, Bug, or Feature?:
Bug

Virtual environments affected

  • Ubuntu 16.04
  • Ubuntu 18.04
  • Ubuntu 20.04
  • macOS 10.15
  • macOS 11
  • Windows Server 2016
  • Windows Server 2019
  • Windows Server 2022

Image version

Current runner version: '2.278.0'
Operating System
  Microsoft Windows Server 2019
  10.0.17763
  Datacenter
Virtual Environment
  Environment: windows-2019
  Version: 20210531.1
  Included Software: actions/virtual-environments@win19/20210531.1/images/win/Windows2019-Readme.md
  Image Release: actions/virtual-environments@win19%2F20210531.1 (release)

Expected behavior
actions/cache should speed up dotnet restore to take a few seconds

Actual behavior
Dotnet restore even with actions/cache takes well over 30 seconds.

Repro steps
https://github.com/jetersen/dotnet.restore.slow.github.action

@al-cheb al-cheb added external investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows and removed needs triage labels Jun 15, 2021
@al-cheb
Copy link
Contributor

al-cheb commented Jun 15, 2021

Hello, @jetersen.
Please provide a small repo how to reproduce the issue.

@jetersen
Copy link
Author

Well repro was super easy actually.

https://github.com/jetersen/dotnet.restore.slow.github.action

No cache 41s
Cached 33s

Locally on my machine the cache speeds up the restore to under 1 seconds. Starting dotnet restore also seem relatively slow when called through pwsh shell 😰

@jetersen
Copy link
Author

jetersen commented Jun 15, 2021

The scary part is that the slowness is consistent at least according to the dotnet restore timings.

Third using cmd cached 37s

On bigger projects this time scale goes into minutes and we would expect that the cache would speed up the restore significantly.

@jetersen
Copy link
Author

jetersen commented Jun 15, 2021

Well Linux and MacOS does not have this issue.

Ubuntu-20.04 no cache 14s
Ubuntu-20.04 with cache 7s
MacOS seems to also benefit from the cache with 6s before cache it was 57s.

This is more along the lines of what I would expect.

No improvement between each run on windows with cache...

@jetersen
Copy link
Author

In log you can clearly see the timescale on windows not making sense but it is consistent.

  Determining projects to restore...
  Restored D:\a\dotnet.restore.slow.github.action\dotnet.restore.slow.github.action\tests\Library.UnitTests\Library.UnitTests.csproj (in 20.78 sec).
  Restored D:\a\dotnet.restore.slow.github.action\dotnet.restore.slow.github.action\tests\Api.UnitTests\Api.UnitTests.csproj (in 20.78 sec).
  Restored D:\a\dotnet.restore.slow.github.action\dotnet.restore.slow.github.action\src\Library\Library.csproj (in 2 ms).
  Restored D:\a\dotnet.restore.slow.github.action\dotnet.restore.slow.github.action\src\Api\Api.csproj (in 9 ms).

@jetersen
Copy link
Author

jetersen commented Jun 15, 2021

@al-cheb Hope the repro is good enough, we were seeing the similar behavior in our private repo on our unit test project but also on some of our projects with more dependencies. We do restore our internal packages from an internal NuGet source but that would not affect caching.

@al-cheb
Copy link
Contributor

al-cheb commented Jun 15, 2021

@jetersen, Looks like it doesn't work only with .net core 5.0.301. It is recommended to use actions/setup-dotnet@v1 task to pick up version:

E.g. - https://github.com/vsafonkin-test-organization/dotnet.restore.slow.github.action/runs/2829395627?check_suite_focus=true

      - name: Setup dotnet
        uses: actions/setup-dotnet@v1
        with:
          dotnet-version: '5.0.300'

      - uses: actions/cache@v2
        with:
          path:
            ~/.nuget/packages
          key: ${{ runner.os }}-nuget-5.0.300-${{ hashFiles('**/packages.lock.json') }}
          restore-keys: |
            ${{ runner.os }}-nuget-
            
      - name: dotnet restore
        run: |
          dotnet restore --verbosity n

image

@jetersen
Copy link
Author

Good thing is you now have an good repro case to fix it and hopefully it won't reappear in the next dotnet release 😅

@vsafonkin
Copy link
Contributor

Hi @jetersen, dotnet 6.0 (preview version) also works properly. Looks like this bug affects only the latest patch version of 5.0

@jetersen
Copy link
Author

@vsafonkin odd, I guess someone else already caught the regression 😅

@jetersen
Copy link
Author

jetersen commented Jun 15, 2021

@vsafonkin @al-cheb I think we can close it as you mentioned it only affects the latest 5.0.301 patch. I would still like to know what went wrong in the latest patch but not sure it is something we can fix here 😊

Thank you for investigating the issue.

@al-cheb al-cheb closed this as completed Jun 16, 2021
@SteveDesmond-ca
Copy link

This is still happening with the latest version of .NET (6.0.x) -- I have projects consistently taking multiple minutes to restore a minimal set of dependencies.

@jetersen
Copy link
Author

jetersen commented Dec 9, 2021

I have noticed the same thing recently.

@jetersen
Copy link
Author

jetersen commented Dec 10, 2021

Just saw a 1 minute restore for very basic test packages:
https://github.com/specshell/specshell.software.ndde/runs/4481718755?check_suite_focus=true
@al-cheb @vsafonkin should we open the issue elsewhere is this better suited for dotnet org?

@vsafonkin
Copy link
Contributor

Hi @jetersen, I cannot reproduce it for 6.0.100 version:
restore without caching about 2 min: run
restore with caching about 3 sec: run

Looks like it works properly or I miss something?

@SteveDesmond-ca
Copy link

My latest test shows that it's not just dotnet restore, but almost the entire windows-latest environment is at least an order of magnitude slower than the exact same workflow running on ubuntu-latest (diff).

Step Ubuntu run Windows run
Checkout 1 s 12 s
Install .NET 13 s 64 s
Restore 8 s 120 s
Build 4 s 8 s
Test 6 s 7 s
Total 34 s 216 s

The build and test steps are close enough, but all of the actions that require some sort of network transfer are extremely slow. Is this a known issue with Windows environments, and if so, can something be added to documentation somewhere to indicate that the huge performance hit is expected?

@jetersen
Copy link
Author

@SteveDesmond-ca nice comparison, I can definitely attest to this based on our private repos.

@vsafonkin vsafonkin reopened this Dec 10, 2021
@jetersen
Copy link
Author

jetersen commented Dec 12, 2021

Ya, okay now the issue is no longer dotnet restore but networking is a significant issue in the test case I built:

action: skip telemetry and other slow downs
image
image

action: update dependencies
image
image

Even in the update dependencies action restoring new dependencies is slow on windows.
Ubuntu VM spends 5 seconds
Windows VM spends 14 seconds.

So network seems to be a issue.

Checking the output of the cache as it includes download speeds something is off:
zstd is definitely also a benefit as you can see by the cache size.

Ubuntu

Received 50331648 of 138545810 (36.3%), 48.0 MBs/sec
Received 134351506 of 138545810 (97.0%), 64.0 MBs/sec
Received 138545810 of 138545810 (100.0%), 59.0 MBs/sec
Cache Size: ~132 MB (138545810 B)

Windows

Received 0 of 190559914 (0.0%), 0.0 MBs/sec
Received 96468992 of 190559914 (50.6%), 45.6 MBs/sec
Received 186365610 of 190559914 (97.8%), 58.9 MBs/sec
Received 190559914 of 190559914 (100.0%), 44.9 MBs/sec
Cache Size: ~182 MB (190559914 B)

Sometimes ubuntu seems a lot faster at download:

Received 138545810 of 138545810 (100.0%), 120.4 MBs/sec
Cache Size: ~132 MB (138545810 B)

Could this be fixed tweaking by TcpAckFrequency and TcpNoDelay on the windows VM?

Perhaps update the issue title: slow network transfer on Windows agent

Perhaps worth reconsidering #4424 to have DotNet 6.0 installed as it is the latest LTS.

@zcsizmadia
Copy link

zcsizmadia commented Dec 13, 2021

I had cases when restore took a long time when nuget.org was not available to check the signatures. Setting env var NUGET_CERT_REVOCATION_MODE to offline solved that particular problem. Maybe it is the same issue?
It will force NuGet to check the revocation status of the certificate only against the cached certificate revocation list, and NuGet will not attempt to reach revocation servers.

https://docs.microsoft.com/en-us/nuget/reference/errors-and-warnings/nu3028

@jetersen
Copy link
Author

Timings from yesterday around 14:00 CET. Shows that this is still an issue.

image

@vsafonkin
Copy link
Contributor

@jetersen, we are still investigating it, actually I'm confused that downloading via curl has very similar performance. For example, powershell script (download ~200 Mb 200 times):

For ($i = 0; $i -lt 200; $i++) {
  $outfile = "dotnet-$i.zip"
  curl https://dotnetcli.azureedge.net/dotnet/Sdk/5.0.403/dotnet-sdk-5.0.403-win-x64.zip -o $outfile
  rm $outfile
}

and result:
Screenshot 2021-12-28 at 4 40 57 PM

@al-cheb al-cheb self-assigned this Dec 30, 2021
@al-cheb
Copy link
Contributor

al-cheb commented Jan 6, 2022

@jetersen , @SteveDesmond-ca , Could you check a restore step with params?

- name: Restore deps
  run: |
        dotnet new nugetconfig
        dotnet restore -v n --packages D:\pkgs

@jetersen
Copy link
Author

jetersen commented Jan 7, 2022

@al-cheb that does not resolve the fact that downloading .NET 6 is also slower.
The issue title has changed to be generic: Network and Disk IO in general seems slower on the windows agent.
Where curl on windows for some reason does not have this issue.

@al-cheb
Copy link
Contributor

al-cheb commented Jan 10, 2022

@al-cheb that does not resolve the fact that downloading .NET 6 is also slower. The issue title has changed to be generic: Network and Disk IO in general seems slower on the windows agent. Where curl on windows for some reason does not have this issue.

Ubuntu agents have slightly higher IOPS disk performance configuration. We use install-dotnet.ps1 script for installation provided by DotNet team . The DownloadFile and Extract-Dotnet-Package functions are being slow. We will investigate how to improve performance those functions if we replace DownloadFile -> WebClient and Extract-Dotnet-Package -> 7zip

image

@jetersen
Copy link
Author

jetersen commented Jan 10, 2022

@al-cheb jetersen/dotnet.restore.slow.github.action@f342429 does indeed help with NuGet restore. I used a different approach to what you mentioned.

@vsafonkin
Copy link
Contributor

@jetersen, how we see this issue is not related to network configuration of the Windows runner. We have tested Windows image from Azure Marketplace and got the same network performance. Looks like the problem appears from .NET side because we see performance degradation with newer versions of .NET(for example restore on .NET 5.0 is slower then restore on .NET 3.1). Also there is an issue with precached Nuget packages placed on C:\Program Files (x86)\Microsoft SDKs\NuGetPackages\ directory on the runner: restoring packages from cache is more slowly then downloading from nuget.org, so default nuget config (command dotnet new nugetconfig) contains only nuget.org feed and has better performance. Probably it's related to IO disk operations and looks like there is nothing we can do from our side unfortunately.

As @al-cheb mentions above Ubuntu agents have slightly higher IOPS disk performance configuration.

@jetersen
Copy link
Author

I think dotnet new nugetconfig is a fine solution perhaps some of these things should be documented in the setup-dotnet action or somewhere it makes sense.

Of course there is still the issue with setup-dotnet being slow to install .NET SDK

@vsafonkin
Copy link
Contributor

Of course there is still the issue with setup-dotnet being slow to install .NET SDK

Yes, but to install the .NET SDK, the setup-dotnet action uses scripts maintained by DotNet team as the recommended approach and we also cannot to do anything with it.

https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-install-script

@vsafonkin
Copy link
Contributor

Closed, because there is nothing we can do from our side.

@jetersen
Copy link
Author

Created actions/setup-dotnet#260 😓

@ItalyPaleAle
Copy link

We have I/O perf issues as well, and it's not related to .NET.

For example, when using the actions/cache@v3 Action to restore something from cache. The cache contains a lot of small files that are then unpacked.

Windows:

image

It took Windows 428 seconds to restore 1337MB.

Linux:

image

Although Linux had to restore only 1074MB, it took only 34 seconds.

As you can see, Windows was actually downloading files from the cache faster than Windows. However, I suspect that what took the longest was extracting them: to us, it seems to be more of a disk I/O issue rather than a network one.

This is one big example, but we see Windows agents being much slower in other things that perform disk I/O, such as pulling/pushing Docker images.

@miketimofeev
Copy link
Contributor

@ItalyPaleAle windows and ubuntu runners are different in terms of disk allocation and that may explain the issue.

@ItalyPaleAle
Copy link

@miketimofeev the performance different is very significant however. I don't know what disks are used, but Linux agents processed 1000MB in 32s and WIndows ones in 320s in our tests above, so 10x slower. (I am aware that it's not apples-to-apples as the files in the caches aren't exactly the same, but this difference should be significant regardless)

@wrexbe
Copy link

wrexbe commented Jun 23, 2022

This is very frustrating, the nuget downloading was slow on windows, even timing out sometimes, so to make it faster, and more reliable I wanted to use the cache, only to find out that the cache action is also very slow on windows.
https://github.com/space-wizards/space-station-14/runs/7017334343?check_suite_focus=true
The windows server took 2 minutes, and 29 seconds to restore the compressed file, and the Ubuntu server took 16 seconds. Is there a way to work around this?

@ericmutta
Copy link

Unfortunately, this is still a problem as of July/2023. The dotnet restore command on ubuntu-latest runs in seconds, but for both windows-2019 and windows-latest it takes a minute or more. Is there a fix for this? The problem actually costs us money because of per-minute billing 😞

@thomhurst
Copy link

I've got a pipeline that I run against Linux, Mac and Windows to test cross platform-ness. The Linux and Mac ones take a few minutes. The windows one can take 7, 8 or 9 minutes.

@wrexbe
Copy link

wrexbe commented Sep 22, 2023

Yeah, windows is still like 11 minutes slower then linux for us.
https://github.com/space-wizards/space-station-14/actions/runs/6272151167/job/17033126765

@Danielku15
Copy link

I also see similar timings:
image
https://github.com/CoderLine/alphaSkia/actions/runs/6441066466

My build times are overall significantly slower in Windows. A dotnet build (with restore) of my Nuke build project takes 15secs on ubuntu-latest and >1min on windows-latest. But also other compilations take significantly more time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows
Projects
None yet
Development

No branches or pull requests