Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre cache #6107

Merged
merged 9 commits into from Mar 15, 2021
Merged

Pre cache #6107

merged 9 commits into from Mar 15, 2021

Conversation

Forgind
Copy link
Member

@Forgind Forgind commented Feb 2, 2021

Fixes #5247

Context

Permits creating a cache that RAR can use if it can't find the normal on-disk cache.

Changes Made

Testing

Unit tests

Notes

Not yet manually tested; relies on #6094

@Forgind Forgind requested a review from rokonec February 2, 2021 02:12
@Forgind Forgind marked this pull request as ready for review February 6, 2021 00:28
@Forgind
Copy link
Member Author

Forgind commented Feb 8, 2021

Few notes:
The latest commit claims that I just pulled in the latest version of @rokonec's branch. Actually, I did that then changed and saved the change to accept somewhat incorrect timestamps before I realized I hadn't committed the merge yet, so it's a little messy. Sorry 😞
After looking at it a little more, I don't think I can easily have a separate path to being "checked" for the precomputed cache and the normal cache except at the deserialization stage, since the precomputed cache, to put it more accurately, prefills the standard cache. Having a third cache just for the precomputed version would add more complexity to the change and make serializing caches more complicated.
Also, in PR Reviews, it was brought up that this might be bad for small projects if they get a lot more in their cache than they'd expected. That's true, but it occurred to me that you can always just make more than one cache based on what you think the relevant assemblies are, so if you're making projectType1, perhaps you bring in caches 1, 4, and 5, whereas for projectType2, you bring in caches 3, 5, and 6. That way, you can split the SDK up into user-relevant buckets. Note that that would increase complexity for whoever makes the caches, but I think that's unavoidable.

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a couple of comments inline.

Would it be possible to change the target branch to rokonec/6057-on-disk-cache-serialization for easier review? Thank you!

Comment on lines 677 to 663
Dictionary<string, FileState> newInstanceLocalFileStateCache = new Dictionary<string, FileState>(instanceLocalFileStateCache.Count);
foreach (KeyValuePair<string, FileState> kvp in instanceLocalFileStateCache)
{
string relativePath = FileUtilities.MakeRelative(Path.GetDirectoryName(stateFile), kvp.Key);
newInstanceLocalFileStateCache[relativePath] = kvp.Value;
}
instanceLocalFileStateCache = newInstanceLocalFileStateCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to modify instanceLocalFileStateCache? Is it possible that RAR will be yielding wrong results after the paths have been relativized?

If the answer is yes and this is safe, consider simply modifying instanceLocalFileStateCache instead of creating a new dictionary.

Otherwise, consider the ToDictionary() extension method to do the mapping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! You're right. I did this weird logic specifically so I could temporarily modify instanceLocalFileStateCache (to make it more accessible to SerializeCacheByTranslator) but then neglected to change it back. Fixed, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. nit: Still wondering if calling ToDictionary wouldn't look slightly nicer. It's basically a concise way of expressing the mapping from one collection (in this case a dictionary) to a new dictionary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially thought of this as more performant, but that doesn't really matter for a case like this, and you're right—it does look a lot cleaner. Fixed, thanks!

{
_cache = (SystemState)StateFileBase.DeserializeCache(_stateFile, Log, typeof(SystemState));
_cache = SystemState.DeserializeCacheByTranslator(_stateFile, Log);

// Construct the cache if necessary.
if (_cache == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if (_cache == null)
if (_cache == null && AssemblyInformationCachePaths != null)

?

Calling DeserializePrecomputedCachesByTranslator if there are no precomputed caches looks a bit unexpected.

{
if (!string.IsNullOrEmpty(_stateFile) && _cache.IsDirty)
if (!String.IsNullOrEmpty(AssemblyInformationCacheOutputPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super-nit: string (lower-case) for consistency.

@Forgind Forgind changed the base branch from master to vs16.9 February 25, 2021 18:04
@Forgind Forgind changed the base branch from vs16.9 to master February 25, 2021 18:04
if (!assembliesFound.Contains(relativePath))
{
FileState fileState = kvp.Value;
string fullPath = Path.GetFullPath(Path.Combine(Path.GetDirectoryName(stateFile.ToString()), relativePath));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to not rely on the current directory (as used in Path.GetFullPath) here? We'll soon need to find and fix all places like this to make RAR multithreaded so ideally we wouldn't be introducing more. cc @AR-May

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't actually be relying on the current directory because stateFile should be a full path, so it would effectively be:
Find the path like (current directory) -> C:\Users\Forgind...

I only included that to weed out things like folder1..\folder1.

/// <param name="stateFiles">List of locations of caches on disk.</param>
/// <param name="log">How to log</param>
/// <param name="fileExists">Whether a file exists</param>
/// <returns></returns>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missing return value comment.

}
SerializeCacheByTranslator(stateFile, log);

instanceLocalFileStateCache = oldFileStateCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Put this statement in a finally block to make sure it's executed even if SerializeCacheByTranslator throws?

Comment on lines 677 to 663
Dictionary<string, FileState> newInstanceLocalFileStateCache = new Dictionary<string, FileState>(instanceLocalFileStateCache.Count);
foreach (KeyValuePair<string, FileState> kvp in instanceLocalFileStateCache)
{
string relativePath = FileUtilities.MakeRelative(Path.GetDirectoryName(stateFile), kvp.Key);
newInstanceLocalFileStateCache[relativePath] = kvp.Value;
}
instanceLocalFileStateCache = newInstanceLocalFileStateCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. nit: Still wondering if calling ToDictionary wouldn't look slightly nicer. It's basically a concise way of expressing the mapping from one collection (in this case a dictionary) to a new dictionary.

internal static SystemState DeserializePrecomputedCachesByTranslator(ITaskItem[] stateFiles, TaskLoggingHelper log, FileExists fileExists)
{
SystemState retVal = new SystemState();
retVal.isDirty = stateFiles.Length > 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What if stateFiles is not empty but DeserializeCacheByTranslator returns null so we end up with an empty retVal? Is it still considered dirty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDirty shouldn't matter at this point in almost any case, since we'd presumably modify the cache before considering serializing it anyway unless there really is nothing of interest. I actually slightly prefer this version, since if there are invalid caches, we won't try repeatedly to read something invalid, instead reading something empty, which is less computationally expensive.

@@ -71,7 +72,7 @@ internal sealed class SystemState : StateFileBase, ITranslatable
/// <summary>
/// True if the contents have changed.
/// </summary>
private bool isDirty;
internal bool isDirty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Still need internal access? Isn't it a left-over from when the cache was Json-serialized? Same for the other private -> internal changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this (and others) were for testing so I could make a cache and serialize it without modifying it or actually executing RAR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. This one can be private, though, as it has an internal read/write property, is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! You're right. I added the write part of that property, so it seems I gave myself one way to access it, forgot, and gave myself a second way to access it. Fixed!

foreach (KeyValuePair<string, FileState> kvp in sysState.instanceLocalFileStateCache)
{
string relativePath = kvp.Key;
if (!assembliesFound.Contains(relativePath))
Copy link
Contributor

@cdmihai cdmihai Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth logging a warning if this assembly has been seen before? Or is the expectation that there will be conflicts and the user needs to put the more important files first? If that's the case hopefully it's documented somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be conflicts if, say, two precomputed caches are created (and used) that both include My.Assembly.dll.

It shouldn't be a problem, however, because there should only be one assembly at that relative path, so it should have identical data from each. The original version of this verified that via an mvid check; I since simplified it, so theoretically someone could put incorrect information into their cache and prevent correct information from being read in, but that's just an error with the first cache rather than anything about ordering, and it doesn't break the build or anything, just making your build slightly longer. Given the above, I'd prefer to leave it as-is.

Copy link
Member

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -71,7 +72,7 @@ internal sealed class SystemState : StateFileBase, ITranslatable
/// <summary>
/// True if the contents have changed.
/// </summary>
private bool isDirty;
internal bool isDirty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. This one can be private, though, as it has an internal read/write property, is that correct?

@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
None yet
4 participants