Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for aliased cultures #6003

Closed
wants to merge 14 commits into from
Closed

Conversation

0xced
Copy link

@0xced 0xced commented Jan 4, 2021

Fixes #3897

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This change looks good to me (except for the minor nitpick comments I made) but I don't know too much about this aspect of localization. @wli3, @tarekgh, can I get y'all to take a look?

Comment on lines 79 to 81
// See https://docs.microsoft.com/en-us/dotnet/api/System.Globalization.CultureInfo.LCID#remarks
const int LOCALE_CUSTOM_UNSPECIFIED = 0x1000;
if (culture.LCID == LOCALE_CUSTOM_UNSPECIFIED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please expand on this a bit? It's not immediately obvious to me from the link why non-system locales should be fully rejected here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this could be further discussed. What do we want? Accept any valid culture or accept known (by the operating system) cultures? If we accept any valid culture then we can get rid of this check altogether.

src/Tasks/CultureInfoCache.cs Outdated Show resolved Hide resolved
src/Tasks/CultureInfoCache.cs Outdated Show resolved Hide resolved
src/Tasks/CultureInfoCache.cs Outdated Show resolved Hide resolved
src/Tasks/CultureInfoCache.cs Outdated Show resolved Hide resolved

Assert.Single(t.AssignedFiles);
Assert.Single(t.CultureNeutralAssignedFiles);
Assert.Equal(String.Empty, t.AssignedFiles[0].GetMetadata("Culture"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the critical difference between this test and the above, right? Can you call it out with a comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the Culture metadata is empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I just mean "Please call out the parts of this test that are different". Here I guess it's not just that Culture is empty but also that the culture-neutral part doesn't get assigned.

@tarekgh
Copy link
Member

tarekgh commented Jan 4, 2021

I believe we should get rid of the whole cache and just use CultureInfo.GetCultureInfo API. are we not using this API for any perf reason?

Note that on Windows 10 (on on Linux using ICU), all cultures with the well formed name is considered a valid culture even if the system doesn't have a data for. why we are trying to create a cache at all? CultureInfo.GetCultureInfo already maintaining internal cache for all previously created cultures.

@rainersigwald
Copy link
Member

I believe we should get rid of the whole cache and just use CultureInfo.GetCultureInfo API. are we not using this API for any perf reason?

The .NET Core version of this was introduced in #213, but that was a patch over CultureInfo.GetCultures not existing in .NET Core 1.0. The Framework code that relies on the enumeration and caching dates back to 2003-11-09 18:07:58 in a commit that has an entirely useless message but also has this change:

-                try
-                {
-                    // Construct a CultureInfo just to see if its possible.
-                    new CultureInfo(cultureName);
-                    validCulture = true;
-                }
-                catch (ArgumentException)
-                {
-                    // Just eat the exception, but now we know the culture isn't valid. 
-                }
+                validCulture = CultureStringUtilities.IsValidCultureString(cultureName);

So maybe there was a perf issue 18 years ago? I certainly like the simplicity of your proposal.

@tarekgh
Copy link
Member

tarekgh commented Jan 4, 2021

@rainersigwald we should give it a try. I think the perf hit would be in invalid cases only which I think is not common. Also, on Windows 10, most of the time will not encounter any invalid cases too.

@wli3
Copy link

wli3 commented Jan 4, 2021

I am not familiar with this issue. I will pass it to @tarekgh :)

@rainersigwald
Copy link
Member

@rainersigwald Rainer Sigwald FTE we should give it a try. I think the perf hit would be in invalid cases only which I think is not common. Also, on Windows 10, most of the time will not encounter any invalid cases too.

Ok. @0xced would you be willing to make that change?

@0xced
Copy link
Author

0xced commented Jan 4, 2021

Sure. I force-pushed and simplified to the maximum! (+77 / -901) The IsValidCultureString method is now just a few lines, anything "smart" has been removed.

internal static bool IsValidCultureString(string name)
{
    try
    {
        _ = CultureInfo.GetCultureInfo(name);
        return true;
    }
    catch (CultureNotFoundException)
    {
        return false;
    }
}

0xced and others added 5 commits January 4, 2021 21:15
Co-authored-by: Rainer Sigwald <raines@microsoft.com>
…rity

Co-authored-by: Rainer Sigwald <raines@microsoft.com>
Remove anything "smart" and only rely on `CultureInfo.GetCultureInfo` not throwing `CultureNotFoundException` to deem a culture name valid.

Since CultureInfo.GetCultureInfo() returns a cached CultureInfo there should be no performance issue.
@tarekgh
Copy link
Member

tarekgh commented Jan 4, 2021

@0xced thanks a lot for your help here. the changes looks good. could you please try to clean the usage of FEATURE_CULTUREINFO_GETCULTURES from the cs project and also if there is dead code (maybe something like AssemblyUtilities.CultureInfoHasGetCultures()).

@rainersigwald do we have any perf test can be used to ensure we are not regressing the perf with that?

This was needed in the .NET Core 1.0 era but CultureInfo.GetCultures and CultureInfo.GetCultureInfo are now available in .NET Core (since version 2.0).
@rainersigwald
Copy link
Member

rainersigwald commented Jan 4, 2021

@rainersigwald do we have any perf test can be used to ensure we are not regressing the perf with that?

cc @ladipro and @Forgind but not really. We have VS RPS tests but they exercise very small projects and I don't know if they touch this codepath at all.

@0xced
Copy link
Author

0xced commented Jan 4, 2021

I did some cleanup on FEATURE_CULTUREINFO_GETCULTURES and FEATURE_CULTUREINFO_GETCULTUREINFO in 55f8fee.

When doing this cleanup I noticed this comment:

// Assembly.Location is only available in .netstandard1.5, but MSBuild needs to target 1.3.
// use reflection to access the property

Is it still true that MSBuild needs to target .NET Standard 1.3?

@rainersigwald
Copy link
Member

Is it still true that MSBuild needs to target .NET Standard 1.3?

Nope! Please feel free to remove that :)

@Forgind
Copy link
Member

Forgind commented Jan 4, 2021

It looks like at some point they decided to throw an error when it hadn't been initialized properly? Not sure why:

[Fact]
public void InvalidToolsVersionErrors()
{
Assert.Throws<InitializationException>(() =>
{
string filename = null;
try
{
filename = FileUtilities.GetTemporaryFile();
ProjectRootElement project = ProjectRootElement.Create();
project.Save(filename);
MSBuildApp.BuildProject(
filename,
null,
"ScoobyDoo",
new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase),
new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase),
new ILogger[] { },
LoggerVerbosity.Normal,
new DistributedLoggerRecord[] { },
#if FEATURE_XML_SCHEMA_VALIDATION
false,
null,
#endif
1,
true,
new StringWriter(),
new StringWriter(),
false,
warningsAsErrors: null,
warningsAsMessages: null,
enableRestore: false,
profilerLogger: null,
enableProfiler: false,
interactive: false,
isolateProjects: false,
graphBuild: false,
lowPriority: false,
inputResultsCaches: null,
outputResultsCache: null
);
}
finally
{
if (File.Exists(filename)) File.Delete(filename);
}
}
);
}

It goes back to the original commit:
82177a5#diff-0ef03e858b5dafafd31e725cf2f7be2c22aaa57214afc6bee56d639652b3a9d5R1066-R1082

Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea behind the change is to stop dynamically generating valid cultures and instead use the CultureInfo API (which results in supporting aliased cultures)? This change generally LGTM.

return ValidCultureNames.Contains(name);
try
{
_ = CultureInfo.GetCultureInfo(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going from a hashset lookup to try-catching a function call. This will run for every resource in a project, what's the perf impact here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it does a TryGetValue on a dictionary behind the scenes. https://source.dot.net/#System.Private.CoreLib/CultureInfo.cs,fec605b3b773ab26

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice the discussion before, it looks like we're okay with this change so long as RPS doesn't catch this.

I also didn't realize that the previous valid cultures are calculated on each run. This seems fine to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try itself is fast as long as it doesn't hit the catch (StackOverflow), and I imagine the cache is small enough that the dictionary lookup will be faster than looking through a list of valid cultures. My biggest worry would be if calling into the runtime causes us to load an assembly we otherwise didn't repeatedly. Not sure about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest worry would be if calling into the runtime causes us to load an assembly we otherwise didn't repeatedly. Not sure about that.

The Globalization code residing inside the core library. so this change wouldn't make any difference. Or are you referring to something else?

@0xced
Copy link
Author

0xced commented Jan 5, 2021

I'm a bit lost as to why the InvalidToolsVersionErrors test is now failing and how it relates to the change introduced in this pull request. I noticed that the tests fail only when run with eng/cibuild_bootstrapped_msbuild.sh (which is using stage1/bin/bootstrap/netcoreapp2.1/MSBuild/MSBuild.dll) but I'm not sure what this means. All tests are passing when I run ./build.sh --test --ci (which is using .dotnet/sdk/3.1.100/MSBuild.dll).

Can someone with more MSBuild experience help me to diagnose this issue?

@benvillalobos
Copy link
Member

benvillalobos commented Jan 6, 2021

I wonder if the netcoreapp2.1 MSBuild.dll referenced is using an older version of whatever assembly CultureInfo.GetCultureInfo comes from, and that older version doesn't include aliased cultures? I'll dig into this a bit and report any findings.

edit: I'm constantly running into an issue with eng\cibuild_bootstrapped_msbuild.sh not running the second build because of a permission issue so I can't get a local repro.

benvillalobos
benvillalobos previously approved these changes Jan 6, 2021
Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldly errors read better, we should switch to those while we're modifying these functions.

string result = expander.ExpandIntoStringLeaveEscaped(@"$([System.Globalization.CultureInfo]::GetCultureInfo(`en-US`).ToString())", ExpanderOptions.ExpandProperties, MockElementLocation.Instance);
#else
string result = expander.ExpandIntoStringLeaveEscaped(@"$([System.Globalization.CultureInfo]::new(`en-US`).ToString())", ExpanderOptions.ExpandProperties, MockElementLocation.Instance);
#endif

Assert.Equal(new CultureInfo("en-US").ToString(), result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here

Suggested change
Assert.Equal(new CultureInfo("en-US").ToString(), result);
result.ShouldBe(new CultureInfo("en-US").ToString());

@@ -216,6 +256,30 @@ public void PseudoLocalization(string culture)
Assert.Equal($"MyResource.{culture}.resx", t.AssignedFiles[0].ItemSpec);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to the other asserts as well, we should shouldly-ify functions we touch on.

Suggested change
Assert.Equal($"MyResource.{culture}.resx", t.AssignedFiles[0].ItemSpec);
t.AssignedFiles[0].ItemSpec.ShouldBe($"MyResource.{culture}.resx");

t.Files = new ITaskItem[] { i };
t.Execute();

Assert.Single(t.AssignedFiles);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert -> Shouldly function calls.

@benvillalobos benvillalobos dismissed their stale review January 6, 2021 01:39

Accidentally approved

@0xced
Copy link
Author

0xced commented Jan 6, 2021

Here's why the tests were failing: MSBuild itself uses Strings.shared.resx as resource name. When accepting shared as a valid culture, Microsoft.Build.Strings.shared.resources would not end up in the embedded resources of artifacts/bin/Microsoft.Build.Engine.UnitTests/Debug/netcoreapp2.1/Microsoft.Build.dll leading to the MissingManifestResourceException that was the cause of the failing tests.

MSBuild itself is a good example as to why we should be conservative and accept only aliased cultures in order not to break projects using Name.something.resx where something is not a locale.

I have improved this in 4bdca75 to only deem aliased cultures as valid, not all cultures.

@0xced
Copy link
Author

0xced commented Jan 6, 2021

Thinking more about it, maybe we hit another bug because if shared was assigned as a culture then it should have ended as Microsoft.Build.Strings.shared.resources in the embedded resources of artifacts/bin/Microsoft.Build.Engine.UnitTests/Debug/netcoreapp2.1/Microsoft.Build.dll. 🤔

On .NET Framework, ThreeLetterISOLanguageName has two letters for unknown cultures such as "xx".
@0xced
Copy link
Author

0xced commented Jan 7, 2021

So, I think I finally found a way to identify predefined cultures, under both ICU and NLS.

To deem a culture valid it must either:

  • Have a LCID != 4096 (LOCALE_CUSTOM_UNSPECIFIED)
    or
  • Have a ThreeLetterISOLanguageName and have a NativeName which does not start with "Unknown Language"

A better solution would be to use CultureInfo.GetCultureInfo(name, predefinedOnly: true) but unfortunately it is only available since .NET 5.

I also tweaked the test so that they pass on all supported platforms. Supported cultures vary a lot across different platforms!

@tarekgh
Copy link
Member

tarekgh commented Jan 7, 2021

Have a ThreeLetterISOLanguageName and have a NativeName which does not start with "Unknown Language"

Please don't use this. Windows is not promising this will work. That is why we have exposed the other GetCultureInfo overload.

Have a LCID != 4096 (LOCALE_CUSTOM_UNSPECIFIED)

If you use this, please check LOCALE_CUSTOM_DEFAULT too.

LOCALE_CUSTOM_DEFAULT       = 0x0c00;

Also to clarify, checking LOCALE_CUSTOM_UNSPECIFIED would exclude some legitimate Windows cultures. I didn't follow closely here but would be nice if someone clarify why we care filtering out these cultures?

@0xced
Copy link
Author

0xced commented Jan 20, 2021

Please don't use this. Windows is not promising this will work. That is why we have exposed the other GetCultureInfo overload.

Hmmm, but CultureInfo.GetCultureInfo(name, predefinedOnly: true) is only available since .NET 5 and I don't think MSBuild is ready to switch to .NET 5, right? Is there another way without using a .NET 5-only API?

Also to clarify, checking LOCALE_CUSTOM_UNSPECIFIED would exclude some legitimate Windows cultures.

Do you have an example of such a culture so that I can add it to the tests and make sure it's considered a valid culture?

I didn't follow closely here but would be nice if someone clarify why we care filtering out these cultures?

Just to be conservative and to not accidentally break too much existing projects. MSBuild itself breaks when using shared as a culture name, see my previous comments: #6003 (comment)

@tarekgh
Copy link
Member

tarekgh commented Jan 20, 2021

Hmmm, but CultureInfo.GetCultureInfo(name, predefinedOnly: true) is only available since .NET 5 and I don't think MSBuild is ready to switch to .NET 5, right? Is there another way without using a .NET 5-only API?

I assume we are talking when running on Windows. If so, you may call the OS directly for that. something like the call https://source.dot.net/#System.Private.CoreLib/CultureInfo.Nls.cs,14
If we need that on Linux, that will be more complicated to do as we depend on ICU to get the info.

Do you have an example of such a culture so that I can add it to the tests and make sure it's considered a valid culture?

please run the following code and it will give you such list.

            foreach (CultureInfo ci in CultureInfo.GetCultures(CultureTypes.AllCultures))
            {
                if (ci.LCID == 0x1000)
                {
                    Console.WriteLine(ci.Name);
                }
            }

Just to be conservative and to not accidentally break too much existing projects. MSBuild itself breaks when using shared as a culture name, see my previous comments: #6003 (comment)

Is there any other way to include the shared resources without considering it as a culture? I mean can the resource file be renamed to something else (like String_shared.resx for instance)? or we don't have control over that?

Base automatically changed from master to main March 15, 2021 20:09
@Forgind
Copy link
Member

Forgind commented Mar 31, 2021

Any recent updates on this?

@Forgind
Copy link
Member

Forgind commented Apr 26, 2021

Team triage: closing for inactivity. Happy to reopen if we can work through its issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cultures aliased by ICU cannot be used for resource localization on non-Windows environments
6 participants