Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Out of memory" error when running GC benchmarks #3382

Open
alexcovington opened this issue Oct 4, 2023 · 1 comment
Open

"Out of memory" error when running GC benchmarks #3382

alexcovington opened this issue Oct 4, 2023 · 1 comment

Comments

@alexcovington
Copy link

I'm getting an Out of memory error when I try to run the HighMemory Server scenario with .NET 6.0 using GC.Infrastructure.exe.

Here is the .yaml file I'm using:

HighMemory.yaml
runs:

  server:
    override_parameters:
      tlgb: 2
      sohsi: 50

  workstation:
    override_parameters:
      tlgb: 2
      sohsi: 50
    environment_variables:
      COMPlus_GCServer: 0

# Top level microbenchmark configuration.
gcperfsim_configurations:
  parameters:
    tc: 36
    tagb: 540
    tlgb: 2
    lohar: 0
    pohar: 0
    sohsr: 100-4000
    lohsr: 102400-204800
    pohsr: 100-204800
    sohsi: 0
    lohsi: 0
    pohsi: 0
    sohpi: 0
    lohpi: 0
    sohfi: 0
    lohfi: 0
    pohfi: 0
    allocType: reference
    testKind: time
  gcperfsim_path: D:\ac\20230927\performance\artifacts\bin\GCPerfSim\Release\net6.0\GCPerfSim.dll

coreruns:
  baseline:
    path: D:\ac\20230927\runtime-6.0\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe
    environment_variables:
      COMPlus_GCName: clrgc.dll
  run:
    path: D:\ac\20230927\runtime-8.0\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root\corerun.exe
    environment_variables:
      COMPlus_GCName: clrgc.dll

environment:
  environment_variables:
    COMPlus_GCServer: 1
    COMPlus_GCHeapCount: 18
    COMPlus_GCName: clrgc.dll 
    COMPlus_GCHeapHardLimit: "0x100000000"
    COMPlus_GCTotalPhysicalMemory: "0x100000000"

  default_max_seconds: 300 
  iterations: 3

# Configurations that involve capturing a trace.
trace_configurations:
  type: gc # Choices: gc, verbose, cpu, threadtime, none

output:
  path: D:\ac\20230927\GCPerfSim\HighMemory_NormalServer_debug
  columns:
  - Count
  - total allocated (mb)
  - total pause time (msec)
  - PctTimePausedInGC
  - FirstToLastGCSeconds
  - HeapSizeAfter_Mean
  - HeapSizeBeforeMB_Mean
  - PauseDurationMSec_95PWhereIsGen0
  - PauseDurationMSec_95PWhereIsGen1
  - PauseDurationMSec_95PWhereIsBackground
  - PauseDurationMSec_95PWhereIsBlockingGen2
  - CountIsBlockingGen2
  - HeapCount
  - TotalNumberGCs
  - TotalAllocatedMB
  - Speed
  - PauseDurationMSec_MeanWhereIsEphemeral
  - PauseDurationMSec_MeanWhereIsBackground
  - PauseDurationMSec_MeanWhereIsBlockingGen2
  - PauseDurationSeconds_SumWhereIsGen1
  - PauseDurationSeconds_Sum
  - CountIsGen1
  - ExecutionTimeMSec
  percentage_disk_remaining_to_stop_per_run: 0
  all_columns:
  - Count
  - total allocated (mb)
  - total pause time (msec)
  - PctTimePausedInGC
  - FirstToLastGCSeconds
  - HeapSizeAfter_Mean
  - HeapSizeBeforeMB_Mean
  - PauseDurationMSec_95PWhereIsGen0
  - PauseDurationMSec_95PWhereIsGen1
  - PauseDurationMSec_95PWhereIsBackground
  - PauseDurationMSec_95PWhereIsBlockingGen2
  - CountIsBlockingGen2
  - HeapCount
  - TotalNumberGCs
  - TotalAllocatedMB
  - Speed
  - PauseDurationMSec_MeanWhereIsEphemeral
  - PauseDurationMSec_MeanWhereIsBackground
  - PauseDurationMSec_MeanWhereIsBlockingGen2
  - PauseDurationSeconds_SumWhereIsGen1
  - PauseDurationSeconds_Sum
  - CountIsGen1
  - ExecutionTimeMSec
  - Count
  - PctTimePausedInGC
  - FirstToLastGCSeconds
  - HeapSizeAfter_Mean
  - HeapSizeBeforeMB_Mean
  - PauseDurationMSec_95PWhereIsGen0
  - PauseDurationMSec_95PWhereIsGen1
  - PauseDurationMSec_95PWhereIsBackground
  - PauseDurationMSec_95PWhereIsBlockingGen2
  - CountIsBlockingGen2
  - HeapCount
  - TotalNumberGCs
  - TotalAllocatedMB
  - Speed
  - PauseDurationMSec_MeanWhereIsEphemeral
  - PauseDurationSeconds_SumWhereIsGen1
  - PauseDurationSeconds_Sum
  - CountIsGen1
  - ExecutionTimeMSec
  formats:
  - markdown
  - json
name: HighMemory_NormalServer 
trace_configurations:
  type: gc

I'm running GC.Infrastructure from the build directory:

D:\ac\20230927\performance\artifacts\bin\GC.Infrastructure\Release\net7.0>.\GC.Infrastructure.exe gcperfsim --configuration D:\ac\performance\src\benchmarks\gc\GC.Infrastructure\Configurations\GCPerfSim\HighMemory.yaml

This does not happen every time, but the majority of the time I run that command, I get the OoM error:

 (10/4/2023 2:20:09 PM) Running HighMemory_NormalServer: baseline for server - Iteration: 0
 HighMemory_NormalServer: baseline for server failed with:
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.
Out of memory.

I attempted to troubleshoot/debug this, but I'm unable to reproduce the OoM if I run GCPerfSim directly:

PS D:\ac\20230927\runtime-6.0\artifacts\tests\coreclr\windows.x64.Release\Tests\Core_Root> .\corerun.exe D:\ac\20230927\performance\artifacts\bin\GCPerfSim\Release\net6.0\GCPerfSim.dll -tc 36 -tagb 540 -tlgb 2 -lohar 0 -pohar 0 -sohsr 100-4000 -lohsr 102400-204800 -pohsr 100-204800 -sohsi 50 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time

When running GCPerfSim directly with a .NET 6.0 corerun.exe, I am unable to reproduce the OoM error.

It seems like this could be an issue with GC.Infrastructure.exe or PerfView since I cannot reproduce the OoM with a standalone corerun.exe, but that's just a guess.

Please let me know if I can clarify anything above. Thanks!

@alexcovington
Copy link
Author

I was able to successfully reproduce the issue outside of GC.Infrastructure.exe. I did not have my environment variables setup correctly initially.

If I set up my environment correctly before running with just corerun.exe, I am able to reproduce the OoM error:

set "COMPlus_GCServer=1"
set "COMPlus_GCHeapCount=18"
set "COMPlus_GCName=clrgc.dll"
set "COMPlus_GCHeapHardLimit=0x100000000" 
set "COMPlus_GCTotalPhysicalMemory=0x100000000"
corerun.exe D:\ac\20230929\performance\artifacts\bin\GCPerfSim\Debug\net6.0\GCPerfSim.dll -tc 36 -tagb 540 -tlgb 2 -lohar 0 -pohar 0 -sohsr 100-4000 -lohsr 102400-204800 -pohsr 100-204800 -sohsi 50 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time

I get this output:

corerun D:\ac\20230929\performance\artifacts\bin\GCPerfSim\Release\net6.0\GCPerfSim.dll -tc 36 -tagb 540 -tlgb 2 -lohar 0 -pohar 0 -sohsr 100-4000 -lohsr 102400-204800 -pohsr 100-204800 -sohsi 50 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time
allocating 16,106,127,360 per thread
Running 64-bit? True
PID: 15992
Running 36 threads.
time, ReferenceItem, tlgb 0.055555555038154125, tagb 15, totalMins 0, buckets:
    100-4000; surv every 50; pin every 0; weight 1000; isPoh False
Thread 24 stopping phase after 15360MB
Thread 11 stopping phase after 15360MB
Thread 30 stopping phase after 15360MB
Thread 9 stopping phase after 15360MB
Thread 12 stopping phase after 15360MB
Thread 31 stopping phase after 15360MB
Thread 10 stopping phase after 15360MB
Thread 8 stopping phase after 15360MB
Thread 22 stopping phase after 15360MB
Thread 7 stopping phase after 15360MB
Thread 34 stopping phase after 15360MB
Thread 0 stopping phase after 15360MB
Thread 26 stopping phase after 15360MB
Thread 13 stopping phase after 15360MB
Thread 1 stopping phase after 15360MB
Thread 4 stopping phase after 15360MB
Thread 21 stopping phase after 15360MB
Thread 33 stopping phase after 15360MB
Thread 25 stopping phase after 15360MB
Thread 15 stopping phase after 15360MB
Thread 29 stopping phase after 15360MB
Thread 17 stopping phase after 15360MB
Thread 19 stopping phase after 15360MB
Thread 6 stopping phase after 15360MB
Thread 23 stopping phase after 15360MB
Thread 3 stopping phase after 15360MB
Thread 2 stopping phase after 15360MB
Thread 28 stopping phase after 15360MB
Thread 32 stopping phase after 15360MB
Thread 18 stopping phase after 15360MB
Thread 16 stopping phase after 15360MB
Thread 5 stopping phase after 15360MB
Thread 14 stopping phase after 15360MB
Thread 20 stopping phase after 15360MB
Thread 35 stopping phase after 15360MB
Thread 27 stopping phase after 15360MB
Out of memory.

Interestingly, if I change COMPlus_GCHeapHardLimit=0x200000000 and COMPlus_GCTotalPhysicalMemory=0x200000000 and re-run, I do not get the OoM error. However, I don't know if changing these settings still properly represents the "high memory" scenario that is trying to be measured.

Please let me know if there is a solution/workaround or if I've made a mistake in my configuration. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant