Skip to content

Commit

Permalink
GC infra improvements (#982)
Browse files Browse the repository at this point in the history
* Remove name `dotnet-gc-infra` from the README is this is not named that any more.
* Remove `FirstToLastEventSeconds` metric, prefer `TotalSecondsTaken` instead.
* host_info.yaml now describes the processor ranges contained by each numa node.
* Support finalizable objects in GCPerfSim, similar to pinned objects
* Add `--show-reasons` flag to `analyze-single`
	- print a summary of reason counts over all GCs
	- print reasons for each individual GC in the single-gcs section
* Enhance `--print-processes` with support for arbitrary run metrics
	* Will show nothing if not a CLR process
	* To do this, must support getting ProcessedTrace objects for each individual process from a trace,
	  resulting in some refactoring in process_trace.py
* Added `--no-readjust` parameter to make_memory_load.c.
  This means there is only an initial memory load but it never frees or allocates afterwards.
  This may be more fair as a test, as a GC that consumes less memory shouldn't be punished by another process taking up more memory.
* Support `coreclr_specific` in a config. This allows configs to be different for different coreclrs.
  This was useful as older coreclrs parsed `complus_gcheapaffinitizeranges` differently.
* Convert test_status metrics to direct properties of ProcessedTrace so they're easier to use from jupyter notebook
* Add new metrics AllocedMBAccumulated and MemoryPressure
* Add new configurations, and `bgc_stepping`, `bgc_tuning_loh`, `bgc_tuning_soh` reasons.
	(For testing dotnet/coreclr#26695)
* Support all document output args (e.g. `--output-width`) on all document-creating commands
* Add `NO_DEFAULT` as an alternative to `MISSING`; allows a dataclass to inherit from another which has fields without defaults.
* pylint fixed pylint-dev/pylint#3175 , now must add more disables where this was violated
  • Loading branch information
Andy Hanson authored and billwert committed Oct 30, 2019
1 parent 3403401 commit 425bb02
Show file tree
Hide file tree
Showing 40 changed files with 1,365 additions and 679 deletions.
14 changes: 8 additions & 6 deletions src/benchmarks/gc/README.md
@@ -1,12 +1,14 @@
# About

`dotnet-gc-infra` lets you run GC performance tests and analyze and chart statistics.
This program lets you run GC performance tests and analyze and chart statistics.

Command examples in this document use Bash/PowerShell syntax. If using Window's CMD, replace `/` with `\`.

The general workflow when using `dotnet-gc-infra` is:
The general workflow when using the GC infra is:

* For testing your changes to coreclr, get a master branch build of coreclr, and also your own build. (It can also be used to compare different configurations on just the master branch.)
* For testing your changes to coreclr, get a master branch build of coreclr, and also your own build.
(You can of course use any version of coreclr, not just master.
You can also only test with a single coreclr.)
* Write a benchfile. (Or generate default ones with `suite-create` as in the tutorial.) This will reference the coreclrs and list the tests to be run.
* Run the benchfile and collect traces.
* Run analysis on the output.
Expand Down Expand Up @@ -102,7 +104,7 @@ On non-Windows systems, you'll need [`dotnet-trace`](https://github.com/dotnet/d
On non-Windows systems, to run container tests, you'll need `cgroup-tools` installed.
You should have builds of coreclr available for use in the next step.

Finally, run `py . setup` from the root of dotnet-gc-infra.
Finally, run `py . setup` from the same directory as this README.
This will read information about your system that's relevant to performance analysis (such as cache sizes) and save to `bench/host_info.yaml`.
It will also install some necessary dependencies on Windows.

Expand All @@ -123,13 +125,13 @@ The benchfiles can exist anywhere. This example will use the local directory `be
To avoid writing benchfiles yourself, `suite-create` can generate a few:

```sh
cd path/to/dotnet-gc-infra
py . suite-create bench/suite --coreclrs path_to_coreclr0 path_to_coreclr1
```

`path_to_coreclr0` is the path to a [Core_Root](#Core_Root).

`path_to_coreclr1` should be a different Core_Root. (it can be the same, but the point is to compare performance of two different builds.)
`path_to_coreclr1` should be a different Core_Root. (It can be the same, but the point is to compare performance of two different builds.)
You can omit this if you just intend to test a single coreclr.

If you made a mistake, you can run `suite-create` again and pass `--overwrite`, which clears the output directory (`bench/suite` in this example) first.

Expand Down
101 changes: 100 additions & 1 deletion src/benchmarks/gc/docs/bench_file.md
Expand Up @@ -211,6 +211,94 @@ complus_threadpool_forcemaxworkerthreads: `int | None`
complus_tieredcompilation: `bool | None`
Set to true to enable tiered compilation

complus_bgcfltuningenabled: `bool | None`
Set to true to enable https://github.com/dotnet/coreclr/pull/26695

complus_bgcmemgoal: `int | None`
See comment on https://github.com/dotnet/coreclr/pull/26695

complus_bgcmemgoalslack: `int | None`
See comment on https://github.com/dotnet/coreclr/pull/26695

complus_gcconcurrentfinalization: `bool | None`
Enable concurrent finalization (not available in normal coreclr builds)

container: `[TestConfigContainer](#TestConfigContainer) | None`
Set to run the test in a container.
A container is a job object on Windows, or cgroups / docker container on non-Windows.

affinitize: `bool | None`
If true, this will be run in a job object affinitized to a single core.
Only works on Windows.
See `run_in_job.c`'s `--affinitize` option.

memory_load: `[MemoryLoadOptions](#MemoryLoadOptions) | None`
If set, the test runner will launch a second process that ensures this percentage of the system's memory is consumed.

coreclr_specific: `Mapping[str, [ConfigOptions](#ConfigOptions)] | None`
Maps coreclr name to config options for only that coreclr.
If present, should have an entry for every coreclr.



## ConfigOptions

complus_gcserver: `bool | None`
Set to true to use server GC.

complus_gcconcurrent: `bool | None`
Set to true to allow background GCs.

complus_gcgen0size: `int | None`
gen0size in bytes. (decimal)

complus_gcgen0maxbudget: `int | None`
Max gen0 budget in bytes. (decimal)

complus_gcheapaffinitizeranges: `str | None`
On non-Windows, this should look like: 1,3,5,7-9,12
On Windows, this should include group numbers, like: 0:1,0:3,0:5,1:7-9,1:12

complus_gcheapcount: `int | None`
Number of heaps. (decimal)
Only has effect when complus_gcserver is set.

complus_gcheaphardlimit: `int | None`
Hard limit on heap size, in bytes. (decimal)

complus_gclargepages: `bool | None`
Set to true to enable large pages.

complus_gcnoaffinitize: `bool | None`
Set to true to prevent affinitizing GC threads to cpu cores.

complus_gccpugroup: `bool | None`
Set to true to enable CPU groups.

complus_gcnumaaware: `bool | None`
Set to false to disable NUMA-awareness in GC

complus_thread_useallcpugroups: `bool | None`
Set to true to automatically distribute threads across CPU Groups

complus_threadpool_forcemaxworkerthreads: `int | None`
Overrides the MaxThreads setting for the ThreadPool worker pool

complus_tieredcompilation: `bool | None`
Set to true to enable tiered compilation

complus_bgcfltuningenabled: `bool | None`
Set to true to enable https://github.com/dotnet/coreclr/pull/26695

complus_bgcmemgoal: `int | None`
See comment on https://github.com/dotnet/coreclr/pull/26695

complus_bgcmemgoalslack: `int | None`
See comment on https://github.com/dotnet/coreclr/pull/26695

complus_gcconcurrentfinalization: `bool | None`
Enable concurrent finalization (not available in normal coreclr builds)

container: `[TestConfigContainer](#TestConfigContainer) | None`
Set to run the test in a container.
A container is a job object on Windows, or cgroups / docker container on non-Windows.
Expand All @@ -220,7 +308,7 @@ affinitize: `bool | None`
Only works on Windows.
See `run_in_job.c`'s `--affinitize` option.

memory_load_percent: `float | None`
memory_load: `[MemoryLoadOptions](#MemoryLoadOptions) | None`
If set, the test runner will launch a second process that ensures this percentage of the system's memory is consumed.


Expand Down Expand Up @@ -290,6 +378,17 @@ allocType: `"simple" | "reference"`
testKind: `"time" | "highSurvival"`


## MemoryLoadOptions

percent: `float`
The memory load process will allocate memory until the system's memory load is this high.

no_readjust: `bool | None`
If true, the memory load process will never allocate or free any more memory after it's started.
If false, it will allocate or free in order to keep the system's memory at `percent`.



## ScoreElement

weight: `float`
Expand Down
10 changes: 8 additions & 2 deletions src/benchmarks/gc/docs/metrics.md
Expand Up @@ -224,6 +224,9 @@ IsNonBackground
IsNonConcurrent
ReasonIs_alloc_loh
ReasonIs_alloc_soh
ReasonIs_bgc_stepping
ReasonIs_bgc_tuning_loh
ReasonIs_bgc_tuning_soh
ReasonIs_empty
ReasonIs_gcstress
ReasonIs_induced
Expand All @@ -246,6 +249,7 @@ UsesPromotion
## float metrics

AllocRateMBSec
AllocedMBAccumulated
AllocedSinceLastGCMB
BGCFinalPauseMSec
BGCLohConcurrentRevisitedPages
Expand Down Expand Up @@ -301,6 +305,7 @@ LastPerHeapHistToEndMSec
MaxBGCWaitMSec
MbAllocatedOnLOHSinceLastGen2Gc
MbAllocatedOnSOHSinceLastSameGenGc
MemoryPressure
Number
PauseDurationMSec
PauseDurationSeconds
Expand Down Expand Up @@ -342,16 +347,17 @@ Gen0Size
Gen1CollectionCount
Gen2CollectionCount
InternalSecondsTaken
NumCreatedWithFinalizers
NumFinalized
ThreadCount
TotalSecondsTaken

### float metrics that require a trace file

FinalYoungestDesiredMB
FirstEventToFirstGCSeconds
FirstToLastEventSeconds
FirstToLastGCSeconds
NumHeaps
HeapCount
PctTimePausedInGC
TotalAllocatedMB
TotalLOHAllocatedMB
Expand Down
4 changes: 4 additions & 0 deletions src/benchmarks/gc/jupyter_notebook.py
Expand Up @@ -165,6 +165,7 @@ def show_summary(trace: ProcessedTrace) -> None:
single_heap_metrics=parse_single_heap_metrics_arg(("InMB", "OutMB")),
show_first_n_gcs=5,
show_last_n_gcs=None,
show_reasons=False,
)
)

Expand Down Expand Up @@ -636,3 +637,6 @@ def _more_custom(trace: ProcessedTrace) -> None:


_more_custom(_TRACE)


# %%
24 changes: 2 additions & 22 deletions src/benchmarks/gc/src/analysis/aggregate_stats.py
Expand Up @@ -2,7 +2,6 @@
# The .NET Foundation licenses this file to you under the MIT license.
# See the LICENSE file in the project root for more information.

from math import ceil, floor
from statistics import mean, stdev
from typing import Callable, Iterable, List, Mapping, Sequence, Tuple, Type, TypeVar

Expand All @@ -17,7 +16,7 @@
)
from ..commonlib.result_utils import all_non_err, as_err, fn_to_ok, flat_map_ok, map_ok
from ..commonlib.type_utils import check_cast, T
from ..commonlib.util import get_percent
from ..commonlib.util import get_95th_percentile, get_percent

from .types import (
Failable,
Expand Down Expand Up @@ -212,25 +211,6 @@ def _fail_if_empty(
return lambda xs: Err(f"<no values>") if is_empty(xs) else Ok(cb(xs))


# numpy has problems on ARM, so using this instead.
def get_percentile(values: Sequence[float], percent: float) -> float:
assert not is_empty(values)
assert 0.0 <= percent <= 100.0
sorted_values = sorted(values)
fraction = percent / 100.0
index_and_fraction = (len(values) - 1) * fraction
prev_index = floor(index_and_fraction)
next_index = ceil(index_and_fraction)
# The closer we are to 'next_index', the more 'next' should matter
next_factor = index_and_fraction - prev_index
prev_factor = 1.0 - next_factor
return sorted_values[prev_index] * prev_factor + sorted_values[next_index]


def _get_95th_percentile(values: Sequence[float]) -> FailableFloat:
return Err("<no values>") if is_empty(values) else Ok(get_percentile(values, 95))


def _stdev(values: Sequence[float]) -> FailableFloat:
if len(values) <= 1:
return Err("Not enough values for stdev")
Expand All @@ -243,6 +223,6 @@ def _stdev(values: Sequence[float]) -> FailableFloat:
"Max": _fail_if_empty(max),
"Min": _fail_if_empty(min),
"Sum": fn_to_ok(sum),
"95P": _get_95th_percentile,
"95P": get_95th_percentile,
"Stdev": _stdev,
}
22 changes: 6 additions & 16 deletions src/benchmarks/gc/src/analysis/analyze_joins.py
Expand Up @@ -13,16 +13,13 @@
from ..commonlib.command import Command, CommandKind, CommandsMapping
from ..commonlib.document import (
Cell,
DocOutputArgs,
Document,
handle_doc,
OutputOptions,
OutputWidth,
OUTPUT_WIDTH_DOC,
output_options_from_args,
Row,
Section,
Table,
TABLE_INDENT_DOC,
TXT_DOC,
)
from ..commonlib.option import map_option, non_null, optional_to_iter
from ..commonlib.result_utils import unwrap
Expand Down Expand Up @@ -64,14 +61,13 @@ class StagesOrPhases(Enum):

@with_slots
@dataclass(frozen=True)
class AnalyzeJoinsAllGcsArgs:
class AnalyzeJoinsAllGcsArgs(DocOutputArgs):
trace_path: Path = argument(name_optional=True, doc=TRACE_PATH_DOC)
process: ProcessQuery = argument(default=None, doc=PROCESS_DOC)
show_n_worst_stolen_time_instances: int = argument(
default=10, doc=_DOC_N_WORST_STOLEN_TIME_INSTANCES
)
show_n_worst_joins: int = argument(default=10, doc=_DOC_N_WORST_JOINS)
txt: Optional[Path] = argument(default=None, doc=TXT_DOC)


def analyze_joins_all_gcs(args: AnalyzeJoinsAllGcsArgs) -> None:
Expand All @@ -86,7 +82,7 @@ def analyze_joins_all_gcs(args: AnalyzeJoinsAllGcsArgs) -> None:
show_n_worst_stolen_time_instances=args.show_n_worst_stolen_time_instances,
show_n_worst_joins=args.show_n_worst_joins,
),
OutputOptions(txt=args.txt),
output_options_from_args(args),
)


Expand All @@ -113,7 +109,7 @@ def analyze_joins_all_gcs_for_jupyter(

@with_slots
@dataclass(frozen=True)
class _AnalyzeJoinsSingleGcArgs:
class _AnalyzeJoinsSingleGcArgs(DocOutputArgs):
trace_path: Path = argument(name_optional=True, doc=TRACE_PATH_DOC)
gc_number: int = argument(doc=GC_NUMBER_DOC)
process: ProcessQuery = argument(default=None, doc=PROCESS_DOC)
Expand All @@ -133,10 +129,6 @@ class _AnalyzeJoinsSingleGcArgs:
)
max_heaps: Optional[int] = argument(default=None, doc="Only show this many heaps")

txt: Optional[Path] = argument(default=None, doc=TXT_DOC)
output_width: Optional[OutputWidth] = argument(default=None, doc=OUTPUT_WIDTH_DOC)
table_indent: Optional[int] = argument(default=None, doc=TABLE_INDENT_DOC)


def _analyze_joins_single_gc(args: _AnalyzeJoinsSingleGcArgs) -> None:
_check_join_analysis_ready()
Expand All @@ -151,9 +143,7 @@ def _analyze_joins_single_gc(args: _AnalyzeJoinsSingleGcArgs) -> None:
show_n_worst_stolen_time_instances=args.show_n_worst_stolen_time_instances,
max_heaps=args.max_heaps,
)
handle_doc(
doc, OutputOptions(width=args.output_width, table_indent=args.table_indent, txt=args.txt)
)
handle_doc(doc, output_options_from_args(args))


def _get_processed_trace_with_just_join_info(
Expand Down

0 comments on commit 425bb02

Please sign in to comment.