Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[System.Convert]::FromBase64String causes memory leak with large strings #101061

Closed
chopinrlz opened this issue Apr 15, 2024 · 8 comments
Closed
Labels

Comments

@chopinrlz
Copy link

Description

Calling [System.Convert]::FromBase64String on a large string, such as one that is over 300 million characters long, appears to leak memory and requires a significant amount of time when compared to doing the same thing with the .NET Framework.

Reproduction Steps

This was tested on PowerShell 7.4.2

  1. Download the .NET 8.0 installer for Windows x64 to use as the test file, the direct link to this is here: https://dotnet.microsoft.com/en-us/download/dotnet/thank-you/sdk-8.0.204-windows-x64-installer
  2. Save this file into a folder on your computer somewhere
  3. Create a PowerShell script with the following contents in the same folder as the .NET 8.0 installer:
Get-Date
Write-Host "Creating Path to dotnet.exe test file"
$file = Join-Path -Path $PSScriptRoot -ChildPath "dotnet.exe"
Write-Host "Reading all file bytes into memory"
$bytes = [System.IO.File]::ReadAllBytes( $file )
Write-Host "Converting file bytes to base64 string"
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host "Converting base64 string back to file bytes"
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host "Test complete"
Get-Date

NOTE: That if you test this with a newer version of the .NET 8.0 installer, you may have to modify the test script to pick the correct file for the test since the filename is hard coded on line 3.

  1. Open a PowerShell window in the folder with the script and test file
  2. Run the PowerShell script
  3. Open Task Manager and observe the memory usage of PowerShell
  4. Note the time required to complete the conversion from Base64 and the usage of upwards of 3.5 GB of RAM to do so

Expected behavior

The .NET 8.0 installer for Windows x64 is approximately 222 MB in size. Reading into memory and converting to base64 then converting back should require about 790 MB of RAM with all variables remaining in scope during the process and no garbage collection happening or object disposal happening. The observed behavior appears to be memory-leak related as the amount of memory used once the conversion eventually completes is about 3.4 GB of RAM. These data points can be see in the attached screen shots.

Actual behavior

When you run the same script in PowerShell 7 and Windows PowerShell 5.1, you see two very different behaviors:

In PowerShell 7.4.2, the time to complete is 82 seconds and memory used is 3.4 GB
In PowerShell 5.1, the time to complete is 7 seconds and memory used is 1.0 GB

This suggests there is an error in the PowerShell 7.4.2 / .NET 8.0 implementation. This bug has been reported in both the PowerShell 7 issues list and the dotnet/runtime issues list.

Regression?

No response

Known Workarounds

No response

Configuration

.NET version is .NET 8.0.204 Windows x64
Test environment is Windows 11 Professional 64-bit edition

Output from $PSVersionTabe
Name                           Value
----                           -----
PSVersion                      7.4.2
PSEdition                      Core
GitCommitId                    7.4.2
OS                             Microsoft Windows 10.0.22631
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Other information

Testing in PowerShell 7.4.2

ps7-test

Testing in Windows PowerShell 5.1

ps5-test

PowerShell 7.4.2 Memory Usage

ps7-mem

PowerShell 5.1 Memory Usage

ps5-mem
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 15, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 15, 2024
@vcsjones vcsjones added area-System.Runtime tenet-performance Performance related issue and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 15, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@KalleOlaviNiemitalo
Copy link

I see this taking between 3 and 6 seconds with dotnet-sdk-8.0.204-win-x64.exe and PowerShell 7.4.2 on Windows 10. I wonder if you have some kind of antivirus software interfering with the operation. Is it equally slow with a random-data file of the same size?

@KalleOlaviNiemitalo
Copy link

This bug has been reported in both the PowerShell 7 issues list and the dotnet/runtime issues list.

The other copy is PowerShell/PowerShell#21473. Linking it here so that people can more easily check what has been investigated already.

@chopinrlz
Copy link
Author

I created the following parameterized script to create a binary file populated with random bytes of any size and used it to create a test file called random.bin of exactly 233,420,544 bytes which is 104 bytes larger than the executable file I first tested with.

param(
	[int]
	$Size
)
$blockSize = 256
$rand = [System.Random]::new()
$total = 0
[byte[]]$data = [System.Array]::CreateInstance( [byte[]], $blockSize )
$path = Join-Path -Path $PSScriptRoot -ChildPath "random.bin"
if( Test-Path $path ) {
	Remove-Item -Path $path -Force
}
$file = [System.IO.File]::OpenWrite( $path )
while( $total -lt $Size ) {
	$rand.NextBytes( $data )
	$file.Write( $data, 0, $data.Length )
	$total += $blockSize
}
$file.Flush()
$file.Close()

I then reran the same test script, modified to use the random.bin file instead of the .NET executable file and the results were similar to before. In Windows PowerShell 5.1 the test executes in 7 seconds while in Windows PowerShell 7.4.2 the test executes in 74 seconds with the majority of the time spent on the call to [System.Convert]::FromBase64String and similar total memory usage of around 3.4 GB of RAM.

It is also interesting to note that after running the test, if you type [GC]::Collect() into the PowerShell 7.4.2 terminal window, only about 200 MB of the 3.4 GB RAM used during the test is released back to the operating system whereas in Windows PowerShell 5.1 doing the same releases nearly all of the used memory back to the operating system.

Here are the screen shots from the latest tests:

Windows PowerShell 5.1

ps5-rand-test

PowerShell 7.4.2

ps7-rand-test

I do not have any third-party anti-virus software installed. I am using the default profile for Windows Defender on an M365-managed device. This is a laptop connected to my own company tenant.

The majority of the test time, and noticeable memory usage, is taken on the call to [System.Convert]::FromBase64String during the test. If you leave the Windows Task Manager open while the script is running, you can see the memory usage climb dramatically when the test script reaches that point.

@KalleOlaviNiemitalo
Copy link

How much RAM does the computer have?

From my reading of the code, the only heap allocation in [System.Convert]::FromBase64String should be for the byte array that it returns. Is it possible that the [System.Convert]::FromBase64String method is somehow replaced (perhaps in PowerShell startup scripts) and the call goes to an implementation different from what is in .NET Runtime?

@chopinrlz
Copy link
Author

chopinrlz commented Apr 16, 2024

I retested the following updated script on my desktop PC. A Ryzen 5800X with 128 GB of RAM and PCIe Gen4 NVMe storage. The test ran much faster as expected, but the memory usage still remains high.

Even after invoking [GC]::Collect() around 3.2 GB of RAM still remains utilized by the pwsh process.

Reading the 222 MB file into memory takes 0.06 seconds and converting it to base64 takes 0.22 seconds and uses 845 MB of RAM across both operations as expected. The last operation [System.Convert]::FromBase64String uses 2.6 GB of RAM alone and takes 17 seconds. The used memory is also not released after calling [GC]::Collect()

ps7-retest-timings

Updated test script:

$name = "random.bin"

$start = Get-Date

Write-Host "Creating Path to $name test file: " -NoNewline
$now = Get-Date
$file = Join-Path -Path $PSScriptRoot -ChildPath $name
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Reading all file bytes into memory: " -NoNewline
$now = Get-Date
$bytes = [System.IO.File]::ReadAllBytes( $file )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Converting file bytes to base64 string: " -NoNewline
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Converting base64 string back to file bytes: " -NoNewline
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Test complete"

Write-Host "Total duration: $(((Get-Date) - $start).TotalMilliseconds) ms"

@chopinrlz
Copy link
Author

chopinrlz commented Apr 16, 2024

So it appears that this is actually a PowerShell issue, not a dotnet/runtime issue. When I repeat the same exact test, using a console application written in C#, it runs normally and there's no memory leak. Below are the testing times using a C# console application. Total memory usage is around 1.0 GB which is expected for creating four copies of a 222 MB file in memory, two of which are base64 encoded. This was tested using the .NET 8.0 run-time for Windows x64, the latest version available on the dotnet website for download as of today.

console-app-test

using System;

namespace Testing {
    public static class Program {
        public static void Main(string[] args) {
            var start = DateTime.Now;

            Console.Write( "ReadAllBytes: " );
            var now = DateTime.Now;
            var file = System.IO.File.ReadAllBytes( @"C:\Github\memleak\random.bin" );
            var duration = DateTime.Now - now;
            Console.WriteLine( "{0:#,000.00} ms", duration.TotalMilliseconds );

            Console.Write( "ToBase64String: " );
            now = DateTime.Now;
            var text = System.Convert.ToBase64String( file );
            duration = DateTime.Now - now;
            Console.WriteLine( "{0:#,000.00} ms", duration.TotalMilliseconds );

            Console.Write( "FromBase64String: " );
            now = DateTime.Now;
            var bin = System.Convert.FromBase64String( text );
            duration = DateTime.Now - now;
            Console.WriteLine( "{0:#,000.00} ms", duration.TotalMilliseconds );

            Console.WriteLine( "Press Enter to call GC.Collect()" );
            Console.ReadLine();
            file = null;
            text = null;
            bin = null;
            GC.Collect();
            Console.WriteLine( "Press Enter to close process" );
            Console.ReadLine();
        }
    }
}

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Apr 17, 2024

I suggest closing this dotnet/runtime issue, because the slowdown is not caused by code in this repository and cannot be fixed here; PowerShell/PowerShell#21473 (comment) shows it is caused by PowerShell calling the AMSI provider of Windows Defender.

@jkotas jkotas closed this as completed Apr 17, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Apr 17, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants