Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC infra improvements #982

Merged
merged 1 commit into from Oct 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 8 additions & 6 deletions src/benchmarks/gc/README.md
@@ -1,12 +1,14 @@
# About

`dotnet-gc-infra` lets you run GC performance tests and analyze and chart statistics.
This program lets you run GC performance tests and analyze and chart statistics.

Command examples in this document use Bash/PowerShell syntax. If using Window's CMD, replace `/` with `\`.

The general workflow when using `dotnet-gc-infra` is:
The general workflow when using the GC infra is:

* For testing your changes to coreclr, get a master branch build of coreclr, and also your own build. (It can also be used to compare different configurations on just the master branch.)
* For testing your changes to coreclr, get a master branch build of coreclr, and also your own build.
(You can of course use any version of coreclr, not just master.
You can also only test with a single coreclr.)
* Write a benchfile. (Or generate default ones with `suite-create` as in the tutorial.) This will reference the coreclrs and list the tests to be run.
* Run the benchfile and collect traces.
* Run analysis on the output.
Expand Down Expand Up @@ -102,7 +104,7 @@ On non-Windows systems, you'll need [`dotnet-trace`](https://github.com/dotnet/d
On non-Windows systems, to run container tests, you'll need `cgroup-tools` installed.
You should have builds of coreclr available for use in the next step.

Finally, run `py . setup` from the root of dotnet-gc-infra.
Finally, run `py . setup` from the same directory as this README.
This will read information about your system that's relevant to performance analysis (such as cache sizes) and save to `bench/host_info.yaml`.
It will also install some necessary dependencies on Windows.

Expand All @@ -123,13 +125,13 @@ The benchfiles can exist anywhere. This example will use the local directory `be
To avoid writing benchfiles yourself, `suite-create` can generate a few:

```sh
cd path/to/dotnet-gc-infra
py . suite-create bench/suite --coreclrs path_to_coreclr0 path_to_coreclr1
```

`path_to_coreclr0` is the path to a [Core_Root](#Core_Root).

`path_to_coreclr1` should be a different Core_Root. (it can be the same, but the point is to compare performance of two different builds.)
`path_to_coreclr1` should be a different Core_Root. (It can be the same, but the point is to compare performance of two different builds.)
You can omit this if you just intend to test a single coreclr.

If you made a mistake, you can run `suite-create` again and pass `--overwrite`, which clears the output directory (`bench/suite` in this example) first.

Expand Down
101 changes: 100 additions & 1 deletion src/benchmarks/gc/docs/bench_file.md
Expand Up @@ -211,6 +211,94 @@ complus_threadpool_forcemaxworkerthreads: `int | None`
complus_tieredcompilation: `bool | None`
Set to true to enable tiered compilation

complus_bgcfltuningenabled: `bool | None`
Set to true to enable https://github.com/dotnet/coreclr/pull/26695

complus_bgcmemgoal: `int | None`
See comment on https://github.com/dotnet/coreclr/pull/26695

complus_bgcmemgoalslack: `int | None`
See comment on https://github.com/dotnet/coreclr/pull/26695

complus_gcconcurrentfinalization: `bool | None`
Enable concurrent finalization (not available in normal coreclr builds)

container: `[TestConfigContainer](#TestConfigContainer) | None`
Set to run the test in a container.
A container is a job object on Windows, or cgroups / docker container on non-Windows.

affinitize: `bool | None`
If true, this will be run in a job object affinitized to a single core.
Only works on Windows.
See `run_in_job.c`'s `--affinitize` option.

memory_load: `[MemoryLoadOptions](#MemoryLoadOptions) | None`
If set, the test runner will launch a second process that ensures this percentage of the system's memory is consumed.

coreclr_specific: `Mapping[str, [ConfigOptions](#ConfigOptions)] | None`
Maps coreclr name to config options for only that coreclr.
If present, should have an entry for every coreclr.



## ConfigOptions

complus_gcserver: `bool | None`
Set to true to use server GC.

complus_gcconcurrent: `bool | None`
Set to true to allow background GCs.

complus_gcgen0size: `int | None`
gen0size in bytes. (decimal)

complus_gcgen0maxbudget: `int | None`
Max gen0 budget in bytes. (decimal)

complus_gcheapaffinitizeranges: `str | None`
On non-Windows, this should look like: 1,3,5,7-9,12
On Windows, this should include group numbers, like: 0:1,0:3,0:5,1:7-9,1:12

complus_gcheapcount: `int | None`
Number of heaps. (decimal)
Only has effect when complus_gcserver is set.

complus_gcheaphardlimit: `int | None`
Hard limit on heap size, in bytes. (decimal)

complus_gclargepages: `bool | None`
Set to true to enable large pages.

complus_gcnoaffinitize: `bool | None`
Set to true to prevent affinitizing GC threads to cpu cores.

complus_gccpugroup: `bool | None`
Set to true to enable CPU groups.

complus_gcnumaaware: `bool | None`
Set to false to disable NUMA-awareness in GC

complus_thread_useallcpugroups: `bool | None`
Set to true to automatically distribute threads across CPU Groups

complus_threadpool_forcemaxworkerthreads: `int | None`
Overrides the MaxThreads setting for the ThreadPool worker pool

complus_tieredcompilation: `bool | None`
Set to true to enable tiered compilation

complus_bgcfltuningenabled: `bool | None`
Set to true to enable https://github.com/dotnet/coreclr/pull/26695

complus_bgcmemgoal: `int | None`
See comment on https://github.com/dotnet/coreclr/pull/26695

complus_bgcmemgoalslack: `int | None`
See comment on https://github.com/dotnet/coreclr/pull/26695

complus_gcconcurrentfinalization: `bool | None`
Enable concurrent finalization (not available in normal coreclr builds)

container: `[TestConfigContainer](#TestConfigContainer) | None`
Set to run the test in a container.
A container is a job object on Windows, or cgroups / docker container on non-Windows.
Expand All @@ -220,7 +308,7 @@ affinitize: `bool | None`
Only works on Windows.
See `run_in_job.c`'s `--affinitize` option.

memory_load_percent: `float | None`
memory_load: `[MemoryLoadOptions](#MemoryLoadOptions) | None`
If set, the test runner will launch a second process that ensures this percentage of the system's memory is consumed.


Expand Down Expand Up @@ -290,6 +378,17 @@ allocType: `"simple" | "reference"`
testKind: `"time" | "highSurvival"`


## MemoryLoadOptions

percent: `float`
The memory load process will allocate memory until the system's memory load is this high.

no_readjust: `bool | None`
If true, the memory load process will never allocate or free any more memory after it's started.
If false, it will allocate or free in order to keep the system's memory at `percent`.



## ScoreElement

weight: `float`
Expand Down
10 changes: 8 additions & 2 deletions src/benchmarks/gc/docs/metrics.md
Expand Up @@ -224,6 +224,9 @@ IsNonBackground
IsNonConcurrent
ReasonIs_alloc_loh
ReasonIs_alloc_soh
ReasonIs_bgc_stepping
ReasonIs_bgc_tuning_loh
ReasonIs_bgc_tuning_soh
ReasonIs_empty
ReasonIs_gcstress
ReasonIs_induced
Expand All @@ -246,6 +249,7 @@ UsesPromotion
## float metrics

AllocRateMBSec
AllocedMBAccumulated
AllocedSinceLastGCMB
BGCFinalPauseMSec
BGCLohConcurrentRevisitedPages
Expand Down Expand Up @@ -301,6 +305,7 @@ LastPerHeapHistToEndMSec
MaxBGCWaitMSec
MbAllocatedOnLOHSinceLastGen2Gc
MbAllocatedOnSOHSinceLastSameGenGc
MemoryPressure
Number
PauseDurationMSec
PauseDurationSeconds
Expand Down Expand Up @@ -342,16 +347,17 @@ Gen0Size
Gen1CollectionCount
Gen2CollectionCount
InternalSecondsTaken
NumCreatedWithFinalizers
NumFinalized
ThreadCount
TotalSecondsTaken

### float metrics that require a trace file

FinalYoungestDesiredMB
FirstEventToFirstGCSeconds
FirstToLastEventSeconds
FirstToLastGCSeconds
NumHeaps
HeapCount
PctTimePausedInGC
TotalAllocatedMB
TotalLOHAllocatedMB
Expand Down
4 changes: 4 additions & 0 deletions src/benchmarks/gc/jupyter_notebook.py
Expand Up @@ -165,6 +165,7 @@ def show_summary(trace: ProcessedTrace) -> None:
single_heap_metrics=parse_single_heap_metrics_arg(("InMB", "OutMB")),
show_first_n_gcs=5,
show_last_n_gcs=None,
show_reasons=False,
)
)

Expand Down Expand Up @@ -636,3 +637,6 @@ def _more_custom(trace: ProcessedTrace) -> None:


_more_custom(_TRACE)


# %%
24 changes: 2 additions & 22 deletions src/benchmarks/gc/src/analysis/aggregate_stats.py
Expand Up @@ -2,7 +2,6 @@
# The .NET Foundation licenses this file to you under the MIT license.
# See the LICENSE file in the project root for more information.

from math import ceil, floor
from statistics import mean, stdev
from typing import Callable, Iterable, List, Mapping, Sequence, Tuple, Type, TypeVar

Expand All @@ -17,7 +16,7 @@
)
from ..commonlib.result_utils import all_non_err, as_err, fn_to_ok, flat_map_ok, map_ok
from ..commonlib.type_utils import check_cast, T
from ..commonlib.util import get_percent
from ..commonlib.util import get_95th_percentile, get_percent

from .types import (
Failable,
Expand Down Expand Up @@ -212,25 +211,6 @@ def _fail_if_empty(
return lambda xs: Err(f"<no values>") if is_empty(xs) else Ok(cb(xs))


# numpy has problems on ARM, so using this instead.
def get_percentile(values: Sequence[float], percent: float) -> float:
assert not is_empty(values)
assert 0.0 <= percent <= 100.0
sorted_values = sorted(values)
fraction = percent / 100.0
index_and_fraction = (len(values) - 1) * fraction
prev_index = floor(index_and_fraction)
next_index = ceil(index_and_fraction)
# The closer we are to 'next_index', the more 'next' should matter
next_factor = index_and_fraction - prev_index
prev_factor = 1.0 - next_factor
return sorted_values[prev_index] * prev_factor + sorted_values[next_index]


def _get_95th_percentile(values: Sequence[float]) -> FailableFloat:
return Err("<no values>") if is_empty(values) else Ok(get_percentile(values, 95))


def _stdev(values: Sequence[float]) -> FailableFloat:
if len(values) <= 1:
return Err("Not enough values for stdev")
Expand All @@ -243,6 +223,6 @@ def _stdev(values: Sequence[float]) -> FailableFloat:
"Max": _fail_if_empty(max),
"Min": _fail_if_empty(min),
"Sum": fn_to_ok(sum),
"95P": _get_95th_percentile,
"95P": get_95th_percentile,
"Stdev": _stdev,
}
22 changes: 6 additions & 16 deletions src/benchmarks/gc/src/analysis/analyze_joins.py
Expand Up @@ -13,16 +13,13 @@
from ..commonlib.command import Command, CommandKind, CommandsMapping
from ..commonlib.document import (
Cell,
DocOutputArgs,
Document,
handle_doc,
OutputOptions,
OutputWidth,
OUTPUT_WIDTH_DOC,
output_options_from_args,
Row,
Section,
Table,
TABLE_INDENT_DOC,
TXT_DOC,
)
from ..commonlib.option import map_option, non_null, optional_to_iter
from ..commonlib.result_utils import unwrap
Expand Down Expand Up @@ -64,14 +61,13 @@ class StagesOrPhases(Enum):

@with_slots
@dataclass(frozen=True)
class AnalyzeJoinsAllGcsArgs:
class AnalyzeJoinsAllGcsArgs(DocOutputArgs):
trace_path: Path = argument(name_optional=True, doc=TRACE_PATH_DOC)
process: ProcessQuery = argument(default=None, doc=PROCESS_DOC)
show_n_worst_stolen_time_instances: int = argument(
default=10, doc=_DOC_N_WORST_STOLEN_TIME_INSTANCES
)
show_n_worst_joins: int = argument(default=10, doc=_DOC_N_WORST_JOINS)
txt: Optional[Path] = argument(default=None, doc=TXT_DOC)


def analyze_joins_all_gcs(args: AnalyzeJoinsAllGcsArgs) -> None:
Expand All @@ -86,7 +82,7 @@ def analyze_joins_all_gcs(args: AnalyzeJoinsAllGcsArgs) -> None:
show_n_worst_stolen_time_instances=args.show_n_worst_stolen_time_instances,
show_n_worst_joins=args.show_n_worst_joins,
),
OutputOptions(txt=args.txt),
output_options_from_args(args),
)


Expand All @@ -113,7 +109,7 @@ def analyze_joins_all_gcs_for_jupyter(

@with_slots
@dataclass(frozen=True)
class _AnalyzeJoinsSingleGcArgs:
class _AnalyzeJoinsSingleGcArgs(DocOutputArgs):
trace_path: Path = argument(name_optional=True, doc=TRACE_PATH_DOC)
gc_number: int = argument(doc=GC_NUMBER_DOC)
process: ProcessQuery = argument(default=None, doc=PROCESS_DOC)
Expand All @@ -133,10 +129,6 @@ class _AnalyzeJoinsSingleGcArgs:
)
max_heaps: Optional[int] = argument(default=None, doc="Only show this many heaps")

txt: Optional[Path] = argument(default=None, doc=TXT_DOC)
output_width: Optional[OutputWidth] = argument(default=None, doc=OUTPUT_WIDTH_DOC)
table_indent: Optional[int] = argument(default=None, doc=TABLE_INDENT_DOC)


def _analyze_joins_single_gc(args: _AnalyzeJoinsSingleGcArgs) -> None:
_check_join_analysis_ready()
Expand All @@ -151,9 +143,7 @@ def _analyze_joins_single_gc(args: _AnalyzeJoinsSingleGcArgs) -> None:
show_n_worst_stolen_time_instances=args.show_n_worst_stolen_time_instances,
max_heaps=args.max_heaps,
)
handle_doc(
doc, OutputOptions(width=args.output_width, table_indent=args.table_indent, txt=args.txt)
)
handle_doc(doc, output_options_from_args(args))


def _get_processed_trace_with_just_join_info(
Expand Down