New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make normal statics simpler #99183
base: main
Are you sure you want to change the base?
Make normal statics simpler #99183
Conversation
- Delete DomainLocalModule and ThreadLocalModule - Replumb the JIT to use a new set of helpers (in progress) - Allocate static data on a per type basis instead of a per module basis - Thread Local Statics are now stored in the same structures that the JIT can optimize (in progress) - More scenarios can support pre-init, notably support for pre-init for cases with valuetype statics, but no Cctor - Remove ModuleForStatics concept - Remove ModuleId concept - Remove ModuleIndex concept - Remove ClassDomainID concept Work still to be done 1. Finish support for R2R, and see if we can make it backcompat with the old R2R version 2. Support for the more optimized helpers (dynamic and pinned) 3. Re-enable jit helper expansions 4. Make sure SOS and the debugger continue to work
…e a smidge faster
- GenericDictionaryExpansion re-used the DomainLocalBlck Crst type, so it now has a new one with the same Crst rules as it used to have
Use <= instead of < for TLS index compare Unallocated TLSIndex is not 0xFFFFFFFF, which will make the existing checks fall back to doing the full work for generic TLS lookups.
- While I didn't do this for most of the Microsoft maintained architectures, there isn't much evidence at the moment that the hand coded assembly actually provides any value
…cInterface14 - This is to compensate for the existing GetDomainLocalModule* api no longer working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new DAC API look good to me. The implementation looks good too.
- Seems to reduce variability in benchmark results by a small factor
private const uint enum_flag_HasCheckedCanCompareBitsOrUseFastGetHashCode = 0x0002; // Whether we have checked the overridden Equals or GetHashCode | ||
private const uint enum_flag_CanCompareBitsOrUseFastGetHashCode = 0x0004; // Is any field type or sub field type overrode Equals or GetHashCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private const uint enum_flag_CanCompareBitsOrUseFastGetHashCode = 0x0004; // Is any field type or sub field type overrode Equals or GetHashCode | |
private const uint enum_flag_CanCompareBitsOrUseFastGetHashCode = 0x0004; // Has any field type or sub field type overridden Equals or GetHashCode |
src/coreclr/vm/methodtable.h
Outdated
@@ -441,12 +523,122 @@ struct MethodTableAuxiliaryData | |||
{ | |||
return !!(m_dwFlags & MethodTableAuxiliaryData::enum_flag_MayHaveOpenInterfaceInInterfaceMap); | |||
} | |||
}; // struct MethodTableAuxiliaryData | |||
|
|||
// Any Generic MethodTable which has static variables has this structure. Note that it ends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought all MethodTable
s with static variables have this structure too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy/Paste issue... moved comment to correct structure
#endif //!DACCESS_COMPILE | ||
|
||
// Do not use except in DAC and profiler scenarios | ||
inline PTR_BYTE GetNonGCThreadStaticsBasePointer(PTR_Thread pThread); | ||
inline PTR_BYTE GetGCThreadStaticsBasePointer(PTR_Thread pThread); | ||
|
||
inline DWORD IsDynamicStatics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do that?
thread->DetachThread(TRUE); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you flip the predicate here and make this the top part of the if check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? That seems less clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically place the smaller block of code first. It is always annoying to scroll through pages of source and then see an else that is a single line. Just something I tend to do, but if you prefer it this way so be it.
src/coreclr/vm/gcenv.ee.cpp
Outdated
@@ -310,6 +311,12 @@ void GCToEEInterface::GcScanRoots(promote_func* fn, int condemned, int max_gen, | |||
} | |||
} | |||
|
|||
if (sc->thread_number == 0 || !GCHeapUtilities::IsServerHeap()) | |||
{ | |||
// This function must be called once per run of calls to ScanThreadStaticRoots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the "why" here.
@@ -704,8 +670,17 @@ void MethodTable::AllocateAuxiliaryData(LoaderAllocator *pAllocator, Module *pLo | |||
pMTAuxiliaryData = (MethodTableAuxiliaryData *)(pAuxiliaryDataRegion + prependedAllocationSpace); | |||
|
|||
pMTAuxiliaryData->SetLoaderModule(pLoaderModule); | |||
pMTAuxiliaryData->SetOffsetToNonVirtualSlots(hasGenericStatics ? -(int16_t)sizeof(GenericsStaticsInfo) : 0); | |||
pMTAuxiliaryData->SetOffsetToNonVirtualSlots(-sizeofStaticsStructure); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment. That -
is a bit subtle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL. I can see missing that one easily. I'll add
// NOTE: There is a - sign here making it so that the offset points to BEFORE the MethodTableAuxiliaryData
src/coreclr/vm/methodtable.cpp
Outdated
|
||
void MethodTable::EnsureStaticDataAllocated() | ||
{ | ||
WRAPPER_NO_CONTRACT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no contract?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I'm lazy? Yep, that's it. In any case, pulling the contract from the Allocate functions is reasonable, so I'll do that.
} | ||
|
||
void MethodTable::AttemptToPreinit() | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contract?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My laziness strikes again. I've pulled the contract from the various allocate functions just like EnsureStaticDataAllocated. That should work
src/coreclr/vm/methodtable.cpp
Outdated
pLocalModule->PopulateClass(this); | ||
void MethodTable::EnsureTlsIndexAllocated() | ||
{ | ||
WRAPPER_NO_CONTRACT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no contract?
|
||
void MethodTable::GetStaticsOffsets(StaticsOffsetType offsetType, bool fGenericStatics, uint32_t *dwGCOffset, uint32_t *dwNonGCOffset) | ||
{ | ||
if (offsetType == StaticsOffsetType::Normal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contract?
This also looks like we should update the BoTR with these details. |
- Adjust comment to use overriden instead of overrode - Add scheme to easily look at statics structures without needing to do math in the debugger - Adjust comment on DynamicStaticsInfo and GenericsStaticsInfo to be correct - Elaborate on comment on non-thread local implementation of GetNonGCThreadStaticsBasePointer and GetGCThreadStaticsBasePointer - Fixup contracts in the threadstatics work and elsewhere, as well as be more intentional about DAC vs non-DAC code.
…blies and just use a Weak Long Handle instead
GC statics are any statics which are represented by classes or by user defined structures. | ||
For user defined structure statics, the static variable is actually a pointer to a boxed instance of the structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GC statics are any statics which are represented by classes or by user defined structures. | |
For user defined structure statics, the static variable is actually a pointer to a boxed instance of the structure. | |
GC statics are any statics which are represented by classes or value-types. | |
For value-type statics, the static variable is actually a pointer to a boxed instance of the value-type. |
I've not see anywhere we refer to value-types as "user defined structures". I'd prefer "value-types" or simple "structs".
|
||
In the above diagram, you can see that we have separate fields for non-gc and gc statics, as well as thread and normal statics. | ||
For normal statics, we use a single pointer sized field, which also happens to encode whether or not the class constructor has been run. | ||
This is done to allow a lock free atomic access to both get the static field address as well as determine if the class constructor needs to be triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to allow a lock free atomic access to both get the static field address as well as determine if the class constructor needs to be triggered. | |
This is done to allow lock free atomic access to both get the static field address as well as determine if the class constructor needs to be triggered. |
2. We provide an ability to share a non-gc thread static between native CoreCLR code and managed code (Subset of `TLSIndexType::DirectOnThreadLocalData`) | ||
3. We provide an extremely efficient means to access a small number of non-gc thread statics. (The rest of the usage of `TLSIndexType::DirectOnThreadLocalData`) | ||
|
||
For the purposes of t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a left over thought?
At GC scan time, each managed object must individually be kept alive only if the type and thread is still alive. This requires properly handling several situations. | ||
1. If a collectible assembly becomes unreferenced, but a thread static variable associated with it has a finalizer, the object must move to the finalization queue. | ||
2. If a thread static variable associated with a collectible assembly refers to the collectible assembly `LoaderAllocator` via a series of object references, it must not provide a reason for the collectible assembly to be considered referenced. | ||
3. If a collectible assembly is collected, then the associated static variables no longer exist, and the TLSIndex values associated with that collectible assembly becomre re-useable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. If a collectible assembly is collected, then the associated static variables no longer exist, and the TLSIndex values associated with that collectible assembly becomre re-useable. | |
3. If a collectible assembly is collected, then the associated static variables no longer exist, and the TLSIndex values associated with that collectible assembly becomes re-useable. |
@kunalspathak could you take a look at the JIT changes here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JIT changes LGTM
src/coreclr/jit/helperexpansion.cpp
Outdated
// or CORINFO_HELP_GETSHARED_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED helper. See | ||
// fgExpandThreadLocalAccess: Inline the CORINFO_HELP_GETDYNAMIC_NONGCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED, | ||
// CORINFO_HELP_GETDYNAMIC_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED, or | ||
// CORINFO_HELP_GETDYNAMIC_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZEDhelper. See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CORINFO_HELP_GETDYNAMIC_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZEDhelper. See | |
// CORINFO_HELP_GETDYNAMIC_GCTHREADSTATIC_BASE_NOCTOR_OPTIMIZED2 helper. See |
src/coreclr/jit/helperexpansion.cpp
Outdated
|
||
// fastPathBb | ||
if (isGCThreadStatic) | ||
JITDUMP("tlsBaseComputeBB: " FMT_BB "\n", tlsBaseComputeBB->bbNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove? We are not dumping BB nums for any other blocks.
As far as I can determine, this looks good. As soon as the CI is green, I will sign off. |
This change makes access to statics much simpler to document and also removes some performance penalties that we've had for a long time due to the old model. Most statics access should be equivalent or faster.
This change converts static variables from a model where statics are associated with the module that defined the metadata of the static to a model where each individual type allocates its statics independently. In addition, it moves the flags that indicate whether or not a type is initialized, and whether or not its statics have been allocated to the
MethodTable
structures instead of storing them in aDomainLocalModule
as was done before.Particularly notable changes
LOADERHANDLE
to keep the static alive, and a new handle type called aHNDTYPE_WEAK_INTERIOR_POINTER
which will keep the pointers to managed objects in theMethodTable
structures up to date with the latest addresses of the static variables.DomainLocalModule
no longer exists, theISOSDacInterface
has been augmented with a new api calledISOSDacInterface14
which adds the ability to query for the static base/initialization status of an individual type directly.With this change, the pointers to normal static data are located at a fixed offset from the start of the
MethodTableAuxiliaryData
, and indices for Thread Static variables are stored also stored in such a fixed offset. Concepts such as theDomainLocalModule
,ThreadLocalModule
,ModuleId
andModuleIndex
no longer exist.Lifetime management for collectible statics
DynamicStaticsInfo
structure, and when relocation occurs, if the collectible types managedLoaderAllocator
is still alive, the static field address will be relocated if the object moves. This is done by means of the new Weak Interior Pointer GC handle type.LoaderAllocator
is being cleaned up, before the WeakTrackResurrection GCHandle that points at the the managedLoaderAllocator
object is destroyed, the mapping from TLS indices to collectibleLoaderAllocator
structures shall be cleared of all relevant entries (and the current GC index shall be stored in the TLS to MethodTable mapping)MethodTable
which is in a collectible assembly, and the associatedLoaderAllocator
has been freed, then set the relevant entry to NULL.LOADERHANDLE
for each object allocated, and associate it with the TLS index on that thread.LOADERHANDLE
. If the collectible type still has a live managedLoaderAllocator
free theLOADERHANDLE
.Expected cost model for extra GC interactions associated with this change
This change adds 3 possible ways in which the GC may have to perform additional work beyond what it used to do.
Perf impact of this change
I've run the .NET Microbenchmark suite as well as a variety of ASP.NET Benchmarks. (Unfortunately the publicly visible infrastructure for running tests is incompatible with this change, so results are not public). The results are generally quite hard to interpret. ASP.NET Benchmarks are generally (very) slightly better, and the microbenchmarks are generally equivalent in performance, although there is variability in some tests that had not previously shown variability, and the differences in performance are contained within the margin of error in our perf testing for tests with any significant amount of code. When performance differences have been examined in detail, they tend to be in code which has not changed in any way due to this change, and when run in isolation the performance deltas have disappeared in all cases that I have examined. Thus, I assume they are caching side effect changes. Performance testing has led me to add a change such that all NonGC, NonCollectible statics are allocated in a separate LoaderHeap which appears to have reduced the variability in some of the tests by a small fraction, although results are not consistent enough for me to be extremely confident in that statement.