GC infra improvements (#982)

* Remove name `dotnet-gc-infra` from the README is this is not named that any more. * Remove `FirstToLastEventSeconds` metric, prefer `TotalSecondsTaken` instead. * host_info.yaml now describes the processor ranges contained by each numa node. * Support finalizable objects in GCPerfSim, similar to pinned objects * Add `--show-reasons` flag to `analyze-single` - print a summary of reason counts over all GCs - print reasons for each individual GC in the single-gcs section * Enhance `--print-processes` with support for arbitrary run metrics * Will show nothing if not a CLR process * To do this, must support getting ProcessedTrace objects for each individual process from a trace, resulting in some refactoring in process_trace.py * Added `--no-readjust` parameter to make_memory_load.c. This means there is only an initial memory load but it never frees or allocates afterwards. This may be more fair as a test, as a GC that consumes less memory shouldn't be punished by another process taking up more memory. * Support `coreclr_specific` in a config. This allows configs to be different for different coreclrs. This was useful as older coreclrs parsed `complus_gcheapaffinitizeranges` differently. * Convert test_status metrics to direct properties of ProcessedTrace so they're easier to use from jupyter notebook * Add new metrics AllocedMBAccumulated and MemoryPressure * Add new configurations, and `bgc_stepping`, `bgc_tuning_loh`, `bgc_tuning_soh` reasons. (For testing dotnet/coreclr#26695) * Support all document output args (e.g. `--output-width`) on all document-creating commands * Add `NO_DEFAULT` as an alternative to `MISSING`; allows a dataclass to inherit from another which has fields without defaults. * pylint fixed pylint-dev/pylint#3175 , now must add more disables where this was violated
dotnet · Oct 30, 2019 · 425bb02 · 425bb02
1 parent 3403401
commit 425bb02
Show file tree

Hide file tree

Showing 40 changed files with 1,365 additions and 679 deletions.
diff --git a/src/benchmarks/gc/README.md b/src/benchmarks/gc/README.md
@@ -1,12 +1,14 @@
 # About
 
-`dotnet-gc-infra` lets you run GC performance tests and analyze and chart statistics.
+This program lets you run GC performance tests and analyze and chart statistics.
 
 Command examples in this document use Bash/PowerShell syntax. If using Window's CMD, replace `/` with `\`.
 
-The general workflow when using `dotnet-gc-infra` is:
+The general workflow when using the GC infra is:
 
-* For testing your changes to coreclr, get a master branch build of coreclr, and also your own build. (It can also be used to compare different configurations on just the master branch.)
+* For testing your changes to coreclr, get a master branch build of coreclr, and also your own build.
+  (You can of course use any version of coreclr, not just master.
+  You can also only test with a single coreclr.)
 * Write a benchfile. (Or generate default ones with `suite-create` as in the tutorial.) This will reference the coreclrs and list the tests to be run.
 * Run the benchfile and collect traces.
 * Run analysis on the output.
@@ -102,7 +104,7 @@ On non-Windows systems, you'll need [`dotnet-trace`](https://github.com/dotnet/d
 On non-Windows systems, to run container tests, you'll need `cgroup-tools` installed.
 You should have builds of coreclr available for use in the next step.
 
-Finally, run `py . setup` from the root of dotnet-gc-infra.
+Finally, run `py . setup` from the same directory as this README.
 This will read information about your system that's relevant to performance analysis (such as cache sizes) and save to `bench/host_info.yaml`.
 It will also install some necessary dependencies on Windows.
 
@@ -123,13 +125,13 @@ The benchfiles can exist anywhere. This example will use the local directory `be
 To avoid writing benchfiles yourself, `suite-create` can generate a few:
 
 ```sh
-cd path/to/dotnet-gc-infra
 py . suite-create bench/suite --coreclrs path_to_coreclr0 path_to_coreclr1
 ```
 
 `path_to_coreclr0` is the path to a [Core_Root](#Core_Root).
 
-`path_to_coreclr1` should be a different Core_Root. (it can be the same, but the point is to compare performance of two different builds.)
+`path_to_coreclr1` should be a different Core_Root. (It can be the same, but the point is to compare performance of two different builds.)
+You can omit this if you just intend to test a single coreclr.
 
 If you made a mistake, you can run `suite-create` again and pass `--overwrite`, which clears the output directory (`bench/suite` in this example) first.
 

diff --git a/src/benchmarks/gc/docs/bench_file.md b/src/benchmarks/gc/docs/bench_file.md
@@ -211,6 +211,94 @@ complus_threadpool_forcemaxworkerthreads: `int | None`
 complus_tieredcompilation: `bool | None`
   Set to true to enable tiered compilation
 
+complus_bgcfltuningenabled: `bool | None`
+  Set to true to enable https://github.com/dotnet/coreclr/pull/26695
+
+complus_bgcmemgoal: `int | None`
+  See comment on https://github.com/dotnet/coreclr/pull/26695
+
+complus_bgcmemgoalslack: `int | None`
+  See comment on https://github.com/dotnet/coreclr/pull/26695
+
+complus_gcconcurrentfinalization: `bool | None`
+  Enable concurrent finalization (not available in normal coreclr builds)
+
+container: `[TestConfigContainer](#TestConfigContainer) | None`
+  Set to run the test in a container.
+  A container is a job object on Windows, or cgroups / docker container on non-Windows.
+
+affinitize: `bool | None`
+  If true, this will be run in a job object affinitized to a single core.
+  Only works on Windows.
+  See `run_in_job.c`'s `--affinitize` option.
+
+memory_load: `[MemoryLoadOptions](#MemoryLoadOptions) | None`
+  If set, the test runner will launch a second process that ensures this percentage of the system's memory is consumed.
+
+coreclr_specific: `Mapping[str, [ConfigOptions](#ConfigOptions)] | None`
+  Maps coreclr name to config options for only that coreclr.
+  If present, should have an entry for every coreclr.
+
+
+
+## ConfigOptions
+
+complus_gcserver: `bool | None`
+  Set to true to use server GC.
+
+complus_gcconcurrent: `bool | None`
+  Set to true to allow background GCs.
+
+complus_gcgen0size: `int | None`
+  gen0size in bytes. (decimal)
+
+complus_gcgen0maxbudget: `int | None`
+  Max gen0 budget in bytes. (decimal)
+
+complus_gcheapaffinitizeranges: `str | None`
+  On non-Windows, this should look like: 1,3,5,7-9,12
+  On Windows, this should include group numbers, like: 0:1,0:3,0:5,1:7-9,1:12
+
+complus_gcheapcount: `int | None`
+  Number of heaps. (decimal)
+  Only has effect when complus_gcserver is set.
+
+complus_gcheaphardlimit: `int | None`
+  Hard limit on heap size, in bytes. (decimal)
+
+complus_gclargepages: `bool | None`
+  Set to true to enable large pages.
+
+complus_gcnoaffinitize: `bool | None`
+  Set to true to prevent affinitizing GC threads to cpu cores.
+
+complus_gccpugroup: `bool | None`
+  Set to true to enable CPU groups.
+
+complus_gcnumaaware: `bool | None`
+  Set to false to disable NUMA-awareness in GC
+
+complus_thread_useallcpugroups: `bool | None`
+  Set to true to automatically distribute threads across CPU Groups
+
+complus_threadpool_forcemaxworkerthreads: `int | None`
+  Overrides the MaxThreads setting for the ThreadPool worker pool
+
+complus_tieredcompilation: `bool | None`
+  Set to true to enable tiered compilation
+
+complus_bgcfltuningenabled: `bool | None`
+  Set to true to enable https://github.com/dotnet/coreclr/pull/26695
+
+complus_bgcmemgoal: `int | None`
+  See comment on https://github.com/dotnet/coreclr/pull/26695
+
+complus_bgcmemgoalslack: `int | None`
+  See comment on https://github.com/dotnet/coreclr/pull/26695
+
+complus_gcconcurrentfinalization: `bool | None`
+  Enable concurrent finalization (not available in normal coreclr builds)
+
 container: `[TestConfigContainer](#TestConfigContainer) | None`
   Set to run the test in a container.
   A container is a job object on Windows, or cgroups / docker container on non-Windows.
@@ -220,7 +308,7 @@ affinitize: `bool | None`
   Only works on Windows.
   See `run_in_job.c`'s `--affinitize` option.
 
-memory_load_percent: `float | None`
+memory_load: `[MemoryLoadOptions](#MemoryLoadOptions) | None`
   If set, the test runner will launch a second process that ensures this percentage of the system's memory is consumed.
 
 
@@ -290,6 +378,17 @@ allocType: `"simple" | "reference"`
 testKind: `"time" | "highSurvival"`
 
 
+## MemoryLoadOptions
+
+percent: `float`
+  The memory load process will allocate memory until the system's memory load is this high.
+
+no_readjust: `bool | None`
+  If true, the memory load process will never allocate or free any more memory after it's started.
+  If false, it will allocate or free in order to keep the system's memory at `percent`.
+
+
+
 ## ScoreElement
 
 weight: `float`

diff --git a/src/benchmarks/gc/docs/metrics.md b/src/benchmarks/gc/docs/metrics.md
@@ -224,6 +224,9 @@ IsNonBackground
 IsNonConcurrent
 ReasonIs_alloc_loh
 ReasonIs_alloc_soh
+ReasonIs_bgc_stepping
+ReasonIs_bgc_tuning_loh
+ReasonIs_bgc_tuning_soh
 ReasonIs_empty
 ReasonIs_gcstress
 ReasonIs_induced
@@ -246,6 +249,7 @@ UsesPromotion
 ## float metrics
 
 AllocRateMBSec
+AllocedMBAccumulated
 AllocedSinceLastGCMB
 BGCFinalPauseMSec
 BGCLohConcurrentRevisitedPages
@@ -301,6 +305,7 @@ LastPerHeapHistToEndMSec
 MaxBGCWaitMSec
 MbAllocatedOnLOHSinceLastGen2Gc
 MbAllocatedOnSOHSinceLastSameGenGc
+MemoryPressure
 Number
 PauseDurationMSec
 PauseDurationSeconds
@@ -342,16 +347,17 @@ Gen0Size
 Gen1CollectionCount
 Gen2CollectionCount
 InternalSecondsTaken
+NumCreatedWithFinalizers
+NumFinalized
 ThreadCount
 TotalSecondsTaken
 
 ### float metrics that require a trace file
 
 FinalYoungestDesiredMB
 FirstEventToFirstGCSeconds
-FirstToLastEventSeconds
 FirstToLastGCSeconds
-NumHeaps
+HeapCount
 PctTimePausedInGC
 TotalAllocatedMB
 TotalLOHAllocatedMB

diff --git a/src/benchmarks/gc/jupyter_notebook.py b/src/benchmarks/gc/jupyter_notebook.py
@@ -165,6 +165,7 @@ def show_summary(trace: ProcessedTrace) -> None:
         single_heap_metrics=parse_single_heap_metrics_arg(("InMB", "OutMB")),
         show_first_n_gcs=5,
         show_last_n_gcs=None,
+        show_reasons=False,
     )
 )
 
@@ -636,3 +637,6 @@ def _more_custom(trace: ProcessedTrace) -> None:
 
 
 _more_custom(_TRACE)
+
+
+# %%
diff --git a/src/benchmarks/gc/src/analysis/aggregate_stats.py b/src/benchmarks/gc/src/analysis/aggregate_stats.py
@@ -2,7 +2,6 @@
 # The .NET Foundation licenses this file to you under the MIT license.
 # See the LICENSE file in the project root for more information.
 
-from math import ceil, floor
 from statistics import mean, stdev
 from typing import Callable, Iterable, List, Mapping, Sequence, Tuple, Type, TypeVar
 
@@ -17,7 +16,7 @@
 )
 from ..commonlib.result_utils import all_non_err, as_err, fn_to_ok, flat_map_ok, map_ok
 from ..commonlib.type_utils import check_cast, T
-from ..commonlib.util import get_percent
+from ..commonlib.util import get_95th_percentile, get_percent
 
 from .types import (
     Failable,
@@ -212,25 +211,6 @@ def _fail_if_empty(
     return lambda xs: Err(f"<no values>") if is_empty(xs) else Ok(cb(xs))
 
 
-# numpy has problems on ARM, so using this instead.
-def get_percentile(values: Sequence[float], percent: float) -> float:
-    assert not is_empty(values)
-    assert 0.0 <= percent <= 100.0
-    sorted_values = sorted(values)
-    fraction = percent / 100.0
-    index_and_fraction = (len(values) - 1) * fraction
-    prev_index = floor(index_and_fraction)
-    next_index = ceil(index_and_fraction)
-    # The closer we are to 'next_index', the more 'next' should matter
-    next_factor = index_and_fraction - prev_index
-    prev_factor = 1.0 - next_factor
-    return sorted_values[prev_index] * prev_factor + sorted_values[next_index]
-
-
-def _get_95th_percentile(values: Sequence[float]) -> FailableFloat:
-    return Err("<no values>") if is_empty(values) else Ok(get_percentile(values, 95))
-
-
 def _stdev(values: Sequence[float]) -> FailableFloat:
     if len(values) <= 1:
         return Err("Not enough values for stdev")
@@ -243,6 +223,6 @@ def _stdev(values: Sequence[float]) -> FailableFloat:
     "Max": _fail_if_empty(max),
     "Min": _fail_if_empty(min),
     "Sum": fn_to_ok(sum),
-    "95P": _get_95th_percentile,
+    "95P": get_95th_percentile,
     "Stdev": _stdev,
 }
diff --git a/src/benchmarks/gc/src/analysis/analyze_joins.py b/src/benchmarks/gc/src/analysis/analyze_joins.py
@@ -13,16 +13,13 @@
 from ..commonlib.command import Command, CommandKind, CommandsMapping
 from ..commonlib.document import (
     Cell,
+    DocOutputArgs,
     Document,
     handle_doc,
-    OutputOptions,
-    OutputWidth,
-    OUTPUT_WIDTH_DOC,
+    output_options_from_args,
     Row,
     Section,
     Table,
-    TABLE_INDENT_DOC,
-    TXT_DOC,
 )
 from ..commonlib.option import map_option, non_null, optional_to_iter
 from ..commonlib.result_utils import unwrap
@@ -64,14 +61,13 @@ class StagesOrPhases(Enum):
 
 @with_slots
 @dataclass(frozen=True)
-class AnalyzeJoinsAllGcsArgs:
+class AnalyzeJoinsAllGcsArgs(DocOutputArgs):
     trace_path: Path = argument(name_optional=True, doc=TRACE_PATH_DOC)
     process: ProcessQuery = argument(default=None, doc=PROCESS_DOC)
     show_n_worst_stolen_time_instances: int = argument(
         default=10, doc=_DOC_N_WORST_STOLEN_TIME_INSTANCES
     )
     show_n_worst_joins: int = argument(default=10, doc=_DOC_N_WORST_JOINS)
-    txt: Optional[Path] = argument(default=None, doc=TXT_DOC)
 
 
 def analyze_joins_all_gcs(args: AnalyzeJoinsAllGcsArgs) -> None:
@@ -86,7 +82,7 @@ def analyze_joins_all_gcs(args: AnalyzeJoinsAllGcsArgs) -> None:
             show_n_worst_stolen_time_instances=args.show_n_worst_stolen_time_instances,
             show_n_worst_joins=args.show_n_worst_joins,
         ),
-        OutputOptions(txt=args.txt),
+        output_options_from_args(args),
     )
 
 
@@ -113,7 +109,7 @@ def analyze_joins_all_gcs_for_jupyter(
 
 @with_slots
 @dataclass(frozen=True)
-class _AnalyzeJoinsSingleGcArgs:
+class _AnalyzeJoinsSingleGcArgs(DocOutputArgs):
     trace_path: Path = argument(name_optional=True, doc=TRACE_PATH_DOC)
     gc_number: int = argument(doc=GC_NUMBER_DOC)
     process: ProcessQuery = argument(default=None, doc=PROCESS_DOC)
@@ -133,10 +129,6 @@ class _AnalyzeJoinsSingleGcArgs:
     )
     max_heaps: Optional[int] = argument(default=None, doc="Only show this many heaps")
 
-    txt: Optional[Path] = argument(default=None, doc=TXT_DOC)
-    output_width: Optional[OutputWidth] = argument(default=None, doc=OUTPUT_WIDTH_DOC)
-    table_indent: Optional[int] = argument(default=None, doc=TABLE_INDENT_DOC)
-
 
 def _analyze_joins_single_gc(args: _AnalyzeJoinsSingleGcArgs) -> None:
     _check_join_analysis_ready()
@@ -151,9 +143,7 @@ def _analyze_joins_single_gc(args: _AnalyzeJoinsSingleGcArgs) -> None:
         show_n_worst_stolen_time_instances=args.show_n_worst_stolen_time_instances,
         max_heaps=args.max_heaps,
     )
-    handle_doc(
-        doc, OutputOptions(width=args.output_width, table_indent=args.table_indent, txt=args.txt)
-    )
+    handle_doc(doc, output_options_from_args(args))
 
 
 def _get_processed_trace_with_just_join_info(