Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove BinaryFormatter from GetSDKReferenceFiles #6324

Merged
merged 12 commits into from
May 24, 2021

Conversation

Forgind
Copy link
Member

@Forgind Forgind commented Apr 3, 2021

I think this is almost right, but TranslatorHelpers needs System.Collections.Concurrent on .NET 3.5, or I need to be able to translate a concurrent dictionary without translating it. (?) Any idea on how I can get around that?

@KirillOsenkov
Copy link
Member

I would hardcode the translation logic for each type T that we need, and just write the count of key-value pairs, followed by each key and value separately.

@Forgind
Copy link
Member Author

Forgind commented Apr 5, 2021

Can you explain why we need the counts? I haven't looked at how complete our test coverage of this task was, but it seems to pass whatever tests we previously had without it.

@KirillOsenkov
Copy link
Member

I might be not understanding it well, but how are you going to read the keys and values if you don’t know how many there are?

@Forgind
Copy link
Member Author

Forgind commented Apr 5, 2021

Looks like the answer is: there weren't tests. Makes it easy to pass them 😋

I was imagining something like looking for closing elements. In json, for instance, they have { "key1": "val1", "key2": "val2" }. You don't need a count because the opening and closing braces match. I don't know if there's an equivalent here, but I now think it just takes references and gets or sets them based on what sort of data you ask for, so since the dictionary would have started out full in serializing, it would serialize properly. Then, in deserialization, the dictionary would be null, and it would throw an exception. If I'd initialized the dictionary, it would still be empty and remain so. I added a count. I should probably add tests.

@Forgind Forgind marked this pull request as ready for review April 9, 2021 16:43
@benvillalobos benvillalobos changed the title Remove BF from GetSDKReferenceFiles Remove BinaryFormatter from GetSDKReferenceFiles Apr 9, 2021
@@ -1205,7 +1194,7 @@ private static IEnumerable<string> GetAllReferenceDirectories(string sdkRoot)
/// </summary>
/// <remarks>This is a serialization format. Do not change member naming.</remarks>
[Serializable]
private class SdkReferenceInfo
internal class SdkReferenceInfo
{
/// <summary>
/// Constructor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're getting rid of useless summaries, might as well delete this one.


/// <summary>
/// Dictionary which maps a directory to a list of file names within that directory. This is used to shortcut hitting the disk for the list of files inside of it.
/// </summary>
public ConcurrentDictionary<string, List<string>> DirectoryToFileList { get; }
public ConcurrentDictionary<string, List<string>> DirectoryToFileList { get { return directoryToFileList; } }

/// <summary>
/// Hashset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove useless summary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the bonus of deleting a summary that isn't actually accurate. Turns out if a mapping isn't in the dictionary, it doesn't look for it; it just assumes it isn't there.

public SDKInfo(ITranslator translator) : this()
{
Translate(translator);
}

/// <summary>
/// Constructor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove useless summary

}

/// <summary>
/// A dictionary which maps a file path to a structure that contain some metadata information about that file.
/// </summary>
public ConcurrentDictionary<string, SdkReferenceInfo> PathToReferenceMetadata { get; }
internal ConcurrentDictionary<string, SdkReferenceInfo> PathToReferenceMetadata { get { return pathToReferenceMetadata; } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed from public to internal, but other properties untouched?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this actually mattered, so I reverted it. See next comment for fiddling.

}
}

/// <summary>
/// This class represents the context information used by the background cache serialization thread.
/// </summary>
private class SaveContext
internal class SaveContext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this need to change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to fiddle with a lot of these to make them accessible to the test class. Then I downgraded a couple because properties were more visible than their containing class, which isn't really good style, but the latter wasn't necessary. As long as something prevents it from being public/protected, I don't think it particularly matters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't reviewed the test so I didn't see any need for it.

string[] keys = dictionary.Keys.ToArray();
for (int i = 0; i < count; i++)
{
if (i < keys.Length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i < keys.Length once, it won't ever change right? For better at-a-glance-understanding™, I'd suggest creating some bool translatingOut = i < keys.Length and using that bool. Or place a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I put the if statement first, which should technically be more performant, too, though not in a particularly meaningful way.

for (int i = 0; i < count; i++)
{
translator.Translate(ref keys[i]);
T value = dictionary[keys[i]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the dictionary change while the loop runs? If a value is removed, this may throw. Do you need to lock? Also perhaps its better to enumerate key value pairs to avoid looking every key up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No—before this PR, there was only the property, and it had a getter but no setter, so it really shouldn't be a ConcurrentDictionary. It's been a ConcurrentDictionary since the initial code commit. Do you want me to change it to an ordinary dictionary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you, if it feels safe

int count = dictionary.Count;
translator.Translate(ref count);
string[] keys = dictionary.Keys.ToArray();
if (keys.Length == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this effectively determining the translation direction (read/write)? If so, is there a more explicit way, perhaps asking the translator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like ITranslator has a TranslationDirection Mode we can take advantage of here.

@marcpopMSFT marcpopMSFT linked an issue Apr 15, 2021 that may be closed by this pull request
27 tasks
Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok as is but I'm pretty curious if we can avoid the ConcurrentDictionary/Dictionary stuff.

@@ -1246,63 +1218,79 @@ public SdkReferenceInfo(string fusionName, string imageRuntime, bool isWinMD, bo
/// Structure that contains the on disk representation of the SDK in memory.
/// </summary>
/// <remarks>This is a serialization format. Do not change member naming.</remarks>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <remarks>This is a serialization format. Do not change member naming.</remarks>

Stale

@@ -1023,7 +1012,7 @@ internal SDKInfo GetCacheFileInfoFromSDK(string sdkRootDirectory, string[] sdkMa

PopulateRedistDictionaryFromPaths(directoryToFileList, redistDirectories);

var cacheInfo = new SDKInfo(references, directoryToFileList, FileUtilities.GetPathsHash(directoriesToHash));
var cacheInfo = new SDKInfo(references.ToDictionary(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase), directoryToFileList.ToDictionary(kvp => kvp.Key, kvp => kvp.Value, StringComparer.OrdinalIgnoreCase), FileUtilities.GetPathsHash(directoriesToHash));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a perf impact of this synchronization? Could you instead store an IDictionary in the SDKInfo fields and on creation it's ConcurrentDictionary/on deserialization it's a plain dictionary?

Comment on lines 1223 to 1225
private Dictionary<string, SdkReferenceInfo> pathToReferenceMetadata;
private Dictionary<string, List<string>> directoryToFileList;
private int hash;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our/runtime's standard naming convention would make these:

Suggested change
private Dictionary<string, SdkReferenceInfo> pathToReferenceMetadata;
private Dictionary<string, List<string>> directoryToFileList;
private int hash;
private Dictionary<string, SdkReferenceInfo> _pathToReferenceMetadata;
private Dictionary<string, List<string>> _directoryToFileList;
private int _hash;

Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

info.IsWinMD = isWinmd;
}, count => new Dictionary<string, SdkReferenceInfo>(count, StringComparer.OrdinalIgnoreCase));

translator.TranslateDictionary(ref _directoryToFileList, (ITranslator t, ref string s) => t.Translate(ref s), (ITranslator t, ref List<string> fileList) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny that the duplicate translation worked because the same method handles input/output. Good thing it's not a huge concern and fairly easy to catch.

I wonder if a certain type of test would catch this? (not blocking on this thought)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but it should work as long as it matches. It serialized the dictionary twice, then deserialized it twice. Setting it the second time overwrote the correct value with the correct value, so everything still worked.

That also means I think it would be very hard to write a test for it...the only two options I can think of are having some kind of counter on how many times we accessed/set each variable or having a check at the end for the total size of the serialized form. Neither is trivial, and it isn't a huge deal for something that gets hit as often as this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting it the second time overwrote the correct value with the correct value, so everything still worked.

I see, I was thinking maybe there'd be duplicate data somehow. Agreed that it's fine as is.

@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label May 14, 2021
@Forgind Forgind merged commit 9d41925 into dotnet:main May 24, 2021
@Forgind Forgind deleted the remove-binaryformatter-from-state-files branch May 24, 2021 21:52
@baronfel baronfel mentioned this pull request Jun 1, 2023
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BinaryFormatter Deprecation
4 participants