Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cultures aliased by ICU cannot be used for resource localization on non-Windows environments #3897

Closed
tarekgh opened this issue Oct 31, 2018 · 62 comments · Fixed by #7853
Closed

Comments

@tarekgh
Copy link
Member

tarekgh commented Oct 31, 2018

From @CodingDinosaur on October 12, 2018 3:30

When building or running under .NET Core on a Unix-based environment, certain cultures cannot be utilized for resource localization, such as getting localized strings. This impacts both the build process (e.g., identifying and processing resource files) and the lookup of resources at runtime. These cultures can be used as expected when building or running under Windows.

The affected cultures are those which are aliased by ICU -- that is, to save on DB space for certain cases, ICU defines some locales as an "alias" of another. There are 42 locale aliases in ICU 57, of those two of the most common are zh-TW and zh-CN. For a full list of affected locales, see: ICU Aliased Locale List - CultureIssueDemonstration Readme.

This platform-inconsistent behavior when trying to localize certain resources, necessitates using special code and workarounds both at build and deploy time when developing cross-platform applications.

See a demo of this issue in CodingDinosaur/CultureIssueDemonstration

Symptoms

  • Resource files for the affected locales will not be generated into resource assemblies if building in a Unix-based environment.
    • For example: MyStrings.zh-TW.resx will not have a resource assembly created if the build occurs on a Unix-based environment, but will if built under Windows
  • Resources for the affected locales will not be utilized even if the resource files are present, falling back to the default resources.
    • For example: Consider MyStrings.resx and MyStrings.zh-TW.resx. If a custom build step is used to generate a copy the resource assembly for MyStrings.zh-TW.resx to workaround the first issue, requesting a resource utilizing culture zh-TW will still return the resource from the default MyStrings.resx resources.
  • Affected cultures are missing when running under Unix-based environments and calling CultureInfo.GetCultures
  • Some culture data for affected cultures is platform inconsistent, notably the parent locale

Expected behavior

  • Primarily, that whatever the "correct" behaviors for these locales are (as it relates to resources and CultureInfo) they be consistent between Windows and Unix-like environments.
  • Secondarily, that aliased locales would "just work" in .NET Core for resource localization - e.g., that we could have resources defined as zh-TW, just as we can on Windows today, and properly retrieve the expected resources.

Brief Analysis

Most of the above symptoms boil down to uloc_getAvailable in ICU's C API.

For example, the zh-TW resource files do not get copied during build, because during the task SplitResourcesByCulture, the culture is validated against a cache based ultimately on CultureInfo.GetCultures, which in turn, on Unix, ultimately relies on ICU. A diagnostic MSBuild log shows why the file is missing:

Removed Item(s): 
  _MixedResourceWithNoCulture=
    Resources/MyNetCoreProject.MyResources.zh-TW.resx
        OriginalItemSpec=Resources/MyNetCoreProject.MyResources.zh-TW.resx
        TargetPath=Resources/MyNetCoreProject.MyResources.zh-TW.resx
        WithCulture=false

From which we can follow back to the offending path:

Microsoft/msbuild/src/Tasks/Microsoft.Common.CurrentVersion.targets - SplitResourceByCulture ->
Microsoft.Build.Tasks.AssignCulture.Execute ->
Microsoft.Build.Tasks.Culture.GetItemCultureInfo ->
Microsoft.Build.Tasks.CultureInfoCache.IsValidCultureString ->
Microsoft.Build.Shared.AssemblyUtilities.GetAllCultures ->
CultureInfo.GetCultures ->
CultureData.GetCultures ->
CultureData(Unix).EnumCultures ->
System.Globalization.Native/locale.cpp:GlobalizationNative_GetLocales
https://github.com/dotnet/coreclr/blob/8ba838fb54d6c07271d026b2d77bedcb9e2a786a/src/corefx/System.Globalization.Native/locale.cpp#L162-L171

ICU does not return aliases when getting a list of locales -- whether with uloc_getAvailable or Locale::getAvailableLocales (and uloc_countAvailable does not include them in its count).

That ICU does not return the aliases in this manner appears to be intentional, both based on the numerous references to a lack of alias mapping in the uloc documentation, and the following bug:

https://unicode-org.atlassian.net/browse/ICU-4309

uloc_getAvailable returns sr_YU, even though it is an %%ALIAS locale. None of the other %%ALIAS locales are returned.

TracBot made changes - 01/Jul/18 1:59 PM
Resolution Fixed [ 10004 ]
Status Done [ 10002 ] Done [ 10002 ]

ICU-4309 was fixed via: unicode-org/icu@ab68bb3
Which seems to further indicate that ICU not returning aliases when calling uloc_getAvailable is intentional.

In-Depth Analysis

A full analysis can be seen in the test repo README: CodingDinosaur/CultureIssueDemonstration

Test Repos

I have two test repos that help demonstrate this issue:

CultureIssueDemonstration

  • https://github.com/CodingDinosaur/CultureIssueDemonstration
  • Demonstrates the symptoms described at both build-time and run-time
  • Running the provided test scripts will allow for running the test code under your current platform, or on Linux via Docker. Thus is is recommended to run under Windows with Docker installed to compare the results under both Windows and Linux.
  • Contains a README file that goes into more detail on the issue and the apparent cause

CultureIcuTest

Copied from original issue: dotnet/coreclr#20388

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

Thanks @CodingDinosaur for reporting the issue and listing the details. this is very helpful. we'll take a look.

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

@cdmihai it is wrong to have msbuild depends on only the list returned from CultureInfo.GetCultures. this was ok before Windows 10 and Linux support but now it is not valid approach.

https://github.com/Microsoft/msbuild/blob/e70a3159d64f9ed6ec3b60253ef863fa883a99b1/src/Shared/AssemblyUtilities.cs#L105

for example, in Windows 10, if you try to create any culture which Windows doesn't have data for, the operation still succeed and the culture can be created as long as the culture name conform to BCP-47 spec.
I understand may be msbuild doing that for perf reason which can be kept but will need to add extra case when failing finding any culture in the list, try to call CultureInfo.GetCultureInfo and find out if can create the culture.

We'll try to look how we can enhance the support for aliased culture as this issue suggested but whatever we do here, msbuild will need to do something more. do you want me open a new issue in msbuild to track that?

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

From @cdmihai on October 15, 2018 18:23

@tarekgh would this be a suitable issue? #1454

Regarding removing valid locale checks, the biggest issue with this is that it would be a breaking change for MSBuild. The msbuild repo build logic itself has the assumption that non-existing locales are rejected, so a lot of strings are put in Microsoft.shared.resx. If we remove the locale check in the SplitResourceByCulture, then the repo fails building with some invalid locale error (fuzzy memory from ~2 years ago). I have no data on this, but this could also break a lot of other existing repos.

FYI @rainersigwald

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

Regarding removing valid locale checks, the biggest issue with this is that it would be a breaking change for MSBuild. The msbuild repo build logic itself has the assumption that non-existing locales are rejected, so a lot of strings are put in Microsoft.shared.resx. If we remove the locale check in the SplitResourceByCulture, then the repo fails building with some invalid locale error (fuzzy memory from ~2 years ago). I have no data on this, but this could also break a lot of other existing repos.

I am not sure I understand the breaking scenario here. The scenario you are describing can occur today. for example build some project with some culture introduced in Windows 10 and then run on down-level platform don't have this culture.

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

From @cdmihai on October 15, 2018 20:40

True, but that's the class of valid locales introduced in different windows versions. The new breaking scenario is for the class of always invalid locales, that were never meant to be locales, and users expect msbuild to not treat them as locales.

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

The new breaking scenario is for the class of always invalid locales, that were never meant to be locales, and users expect msbuild to not treat them as locales.

I am not sure I agree with that. If the OS/.Net can create a culture for these, then those should be a valid locales to use. Why you think msbuild should reject such cultures?

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

From @cdmihai on October 15, 2018 23:16

Personally I agree that msbuild should not care and just do what the OS does, but I fear there are actual customers who depend on this behaviour, and changing this might break them. But this is just gut feeling based on the fact that the msbuild repo itself is doing it, and I don't have actual data on it. Alternatively we can only enable it in .net core msbuild, and then customers will have to opt-in to the break by transitioning to .net core. But it's not nice to diverge behaviour.

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

The only breaking scenario I can think of is when we allow creating resources with a culture not returned by CultureInfo.GetCultures and then move this resources to other machine which cannot understand the used culture. This scenario can happen today anyway. do you have any breaking scenario you can think of?

#1454 is specific to custom cultures. I would suggest updating it to include the other supported system cultures which is not enumerated by CultureInfo.GetCultures

@tarekgh
Copy link
Member Author

tarekgh commented Oct 31, 2018

After looking at this issue, it looks ICU not enumerating the aliased cultures for good reasons. I believe the framework should follow that too and not enumerate such aliased cultures. The framework still can create such aliased culture if anyone want to use them. e.g.:

new CultureInfo("zh-TW"); 

will work fine.

Considering that, I believe the resource issue should be fixed from msbuild side.
msbuild should not depend on the enumerated list only not because of aliased cultures only but also for supporting the behavior of Windows 10 which can create any culture as long as the used name is conforming to the BCP-47 specs.

I am going to move this issue to msbuild repo.

@CodingDinosaur thanks again for reporting this issue.

@mbp
Copy link

mbp commented Jul 23, 2019

Is there any update to this issue, or at least a workaround?

My understanding is, that it's currently not possible to have an ASP.NET Core application localized with zh- cultures on Linux, which seems like a pretty common use case.

@EdiWang
Copy link

EdiWang commented Mar 31, 2021

Any update?

@benvillalobos
Copy link
Member

Team Triage: The design of resource localization is that we infer based on names whether the resource is culture specific. That means we need to be able to take a string and ask if it's a culture. Before the Windows 10 changes to how cultures worked, we just used GetCultures on Windows. The first draft of the Linux support was a hardcoded list of strings. Then we switched over to GetCultures when it's available.

@tarekgh is there a multi-platform check for cultures that doesn't accept dramatically more strings as cultures on windows? Based on @Forgind 's testing it looks like if we just call new CultureInfo() we would incorrectly classify existing user resources as locale specific.

/cc: @wli3 for loc

@tarekgh
Copy link
Member Author

tarekgh commented May 5, 2021

@benvillalobos the PR #6003 had the detailed discussion. Please let me know if there is any question not answered there, I can help answering here. My comment #6003 (comment) is answering your posted question, I guess.

@garyzhuosim
Copy link

Any updates?
Because this issue, I have to build my project on GitHub Action Windows runner, but on Windows runner, I can't build Linux image due to this issue: Can not run Linux docker on Windows runner

So I can't build my asp.net core project to Linux image, this is unacceptable.

@tarekgh
Copy link
Member Author

tarekgh commented Nov 12, 2021

CC @rainersigwald

@rainersigwald
Copy link
Member

@benvillalobos, you've been looking at this, right?

@tarekgh
Copy link
Member Author

tarekgh commented Jul 27, 2022

I've read that page before, but I believe that setting is just for Windows?

That is right. Do you want to build the resources on Linux? You can just create them on Windows and use it with Linux runs as needed. I am trying to say, make a job in your pipeline to build these resources and on Linux you don't need to build the resources and instead copy the resources to output folder after finish building. I am just trying to provide some workaround to unblock you. I am not saying this is a solution as we still need to fix msbuild.

@Bartleby2718
Copy link

Makes sense. I'll try that as I wait for the response to other questions in this thread.

@benvillalobos
Copy link
Member

benvillalobos commented Jul 27, 2022

Q&A time 🤔

@madelson

Does this needs to be set only during runtime, only during the build, or both?

Definitely during build time. Can you describe your runtime scenario? Does this just mean "calling our API for valid cultures?" If so, it applies to both.

With this set, will things work exactly as they do on Windows regarding these cultures or will there still be some discrepancies to be aware of?

If we add this workaround as "if it's not seen in the culture API, use our hardcoded list as a backup," I expect windows/non-windows to behave the same. The only situation to be worried about would be some culture alias not existing in the hardcoded list, so it would still fail on unix. We'd need to handle those on a case by case basis.

Are there any timing implications for setting this variable? Is this the kind of variable that needs to be set before process start? If this gets set at app runtime, can it be set in Main()? Later? I know sometimes with such things you have to set the value before a bunch of stuff gets cached and locked in.

I believe it'll work as long as the env var is set by the time our API gets called. Though consider that ValidCultureNames is loaded into a static hashset within CultureInfoCache.

Would this be supported long-term or only for a set number of releases?

cc @marcpopMSFT . The situation that concerns me is getting a flood of "we need this alias to be supported" in the long term, which isn't very maintainable.

As far as timing, I assume this would go into .NET 7; any chance it would also be back-ported to .NET 6?

@marcpopMSFT

@Bartleby2718

Could you give me a rough ETA on this issue? Will there be a new release including this fix in, say, the next few weeks?

Starting up a PR for it as soon as this comment gets posted.

will it compatible with .NET 6 as well, or does it require an upgrade to .NET 7?

It's a new codepath within our binaries, so It'd need an upgrade to net7 unless we backport.

@madelson
Copy link

Thanks for the detail @benvillalobos !

Can you describe your runtime scenario? Does this just mean "calling our API for valid cultures?" If so, it applies to both.

Mostly our runtime scenario is just setting CultureInfo.CurrentUICulture to (depending on the user) zh-CN or zh-TW, and then accessing resource strings in .resx files which currently have the zh-CN/zh-TW suffixes.

However, we do also have some code that tries to iterate through all cultures for which we have translations at runtime using CultureInfo.GetCultures. This is much easier for us to work around however by substituting our own culture list.

unless we backport

It would be great to know whether this is possible/likely. If this doesn't happen, is there any way to work around this at build time (aside from renaming all the resource files of course)? I suppose we could have a build task which just copies and renames all the files.

@Bartleby2718
Copy link

Bartleby2718 commented Aug 3, 2022

Note: I have updated some of these after @tarekgh let me know (and I confirmed) that the NLS mode is not enabled.

@tarekgh I've tried setting DOTNET_SYSTEM_GLOBALIZATION_USENLS=1, but the results don't exactly match what I expected. @benvillalobos Could you please help me understand the results? I have a list of questions at the bottom.

Command Ran on Git Bash - Linux Container version

docker commands to do some cleanup for reproducibility/idempotency; \
cd to the solution root && \
git clean -dfx; \
dotnet publish My.Project -c Release --self-contained -r linux-x64 -o bin -property:SolutionDir=$(pwd) && \
docker build -f My.Dockerfile -t myImage:1 . && \
docker run --mount type=bind,src=$(pwd),dst=/App -d -p 127.0.0.1:6767:80/tcp myImage:1 && \
start chrome "http://localhost:myPort/page-containing-chinese-strings"

Command Ran on Git Bash - Windows version

git clean -dfx; \
our custom command that basically runs msbuild against the solution && \
start chrome "http://localhost:myPort/page-containing-chinese-strings"

Specs for the Windows Machine Used

Key Value
Edition Windows 10 Enterprise
Version 21H2
Installed on ‎4/‎15/‎2022
OS build 19044.1766
Experience Windows Feature Experience Pack 120.2212.4180.0

Setup

I tried 4 different cases. In each case, I:

  • made sure to set/unset environment variables properly by editing environment variables on Windows, restarting the git bash terminal, and setting/unsetting in git bash, to be extra careful setting System.Globalization.UseNls appropriately in the relevant csproj file.
  • printed every single culture in CultureInfo.GetCultures(CultureTypes.AllCultures) to a file.

Results

Windows version Linux version
Env var set System.Globalization.UseNls true Chinese strings showed up
813 857 cultures printed
Chinese strings showed up
783 cultures printed
Env var unset System.Globalization.UseNls false Chinese strings showed up
813 cultures printed
Defaulted to English strings
783 cultures printed

Miscellaneous Info

  • I have <TargetFramework>net6.0</TargetFramework> in My.Project.csproj.
  • I ensured that routing was done correctly every time.

Questions:

  1. Based on the Windows specs, it appears to me that Windows 10 May 2019 Update must have been installed. (Otherwise, OS build should be below 18362.116, per this page.) This means that my app would use ICU globalization APIs by default, so setting DOTNET_SYSTEM_GLOBALIZATION_USENLS=1 should have resulted in a difference in CultureInfo.GetCultures(CultureTypes.AllCultures). However, I got the same set of cultures whether I ran the app on the Windows host or inside the Linux container. What's the reason behind this?
  • Edit: @tarekgh correctly pointed out that the environment variable was not being set. I confirmed that .NET picked up the environment variable only after I restarted the machine. (I've tried setting both user environment variable and system environment variable.) For faster iteration, I set System.Globalization.UseNls in the relevant .cpsroj file and observed that a) CultureInfo.GetCultures(CultureTypes.AllCultures) had more cultures, including zh-CN. This behavior now makes sense to me.
  1. Since we can build on Windows for the time being, I think DOTNET_SYSTEM_GLOBALIZATION_USENLS (or equivalent solutions like using runtimeconfig.json) can be a viable solution for now. Is this setting going to be around for a long time?
  2. Around 800 cultures were printed in all cases, but none of them included zh-CN. However, I expected zh-CN to show up when using NLS. Why is this?
  • Edit: As mentioned above, this is no longer the case, and it behaves as expected.
  1. My.Project does have zh-CN resources, but how is msbuild able to create the zh-CN directory in bin/Debug (or bin/Release) even when zh-CN is not in CultureInfo.GetCultures(CultureTypes.AllCultures)?

Thank you very much in advance, and I look forward to hearing back from you.

@tarekgh
Copy link
Member Author

tarekgh commented Aug 3, 2022

so setting DOTNET_SYSTEM_GLOBALIZATION_USENLS=1 should have resulted in a difference in CultureInfo.GetCultures(CultureTypes.AllCultures). However, I got the same set of cultures whether I ran the app on the Windows host or inside the Linux container.
Around 800 cultures were printed in all cases, but none of them included zh-CN. However, I expected zh-CN to show up when using NLS. Why is this?

Reading these two comments suggests the NLS mode is not enabled correctly. Could you add the following line in your code and send the output?

                Console.WriteLine($".... UseNls:                 {typeof(object).Assembly.GetType("System.Globalization.GlobalizationMode")!.GetProperty("UseNls", BindingFlags.Static | BindingFlags.NonPublic)!.GetValue(null)} ....");

@Bartleby2718
Copy link

@tarekgh Thanks for pointing that out! I have updated my comment above accordingly.

It appears that the changes I made to environment variable were not applied for some reason. Properly enable the NLS mode answered two of my questions, but I'm still curious about the other two. I'm especially curious about how long .NET will support NLS mode through runtime configuration. Will it survive in .NET 7?

@tarekgh
Copy link
Member Author

tarekgh commented Aug 4, 2022

how long .NET will support NLS mode through runtime configuration. Will it survive in .NET 7?

Yes, it will still be supported in .NET 7.0. We don't have any plan to abandon NLS support in the near future.

@Bartleby2718 if there any questions now answered yet, please point at it and I'll be happy to answer it.

@tarekgh
Copy link
Member Author

tarekgh commented Aug 4, 2022

One last recommendation to @Bartleby2718 is try not to set System.Globalization.UseNls in csproj and instead use the environment variable on the Windows build machine. The reason is when setting System.Globalization.UseNls in the csproj, will force the app to run using NLS which I don't think you need to do that. We are trying to work around the resource build issue only here and not changing the app behavior.

@Bartleby2718
Copy link

@tarekgh That makes sense. I think that answers all my questions. Thanks for the prompt responses!

@Bartleby2718
Copy link

@tarekgh Actually I do have one more question. I believe there was one more option besides csproj and environment variable: runtimeconfig.json. Is it recommended or not?

@tarekgh
Copy link
Member Author

tarekgh commented Aug 4, 2022

I believe there was one more option besides csproj and environment variable: runtimeconfig.json. Is it recommended or not?

In your case, it is not recommended. This one will have the same effect as if you set the property in the csproj.

@madelson
Copy link

Just another point of context on why supporting zh-CN/zh-TW would be valuable; this is the format that our translations management tool uses/expects: https://help.smartling.com/hc/en-us/articles/360049532693-Supported-Languages-

@Bartleby2718
Copy link

@tarekgh I see that a PR has been merged to fix this issue.

  1. In which .NET 7 preview will this be available?
  2. How do you suggest that I test that the PR does indeed fix the issue? Should I install the upcoming preview and test using that?

@tarekgh
Copy link
Member Author

tarekgh commented Aug 15, 2022

@benvillalobos could you please help answering @Bartleby2718 questions?

@benvillalobos
Copy link
Member

@Bartleby2718 Testing should involve running builds that previously didn't output resources for aliases like zh-TW and ensuring you see the expected output.

Based on how the SDK flows, it should be available whenever the next release is available. In a few days I believe

@marcpopMSFT
Copy link
Member

There should be a new daily build available in a few days but the next official release (RC1) won't be available until September.

@Jetski5822
Copy link

@tarekgh I used DOTNET_SYSTEM_GLOBALIZATION_USENLS=1 to sort out our Windows build servers, but also build on Linux which is failing, is there a work around for that?

@tarekgh
Copy link
Member Author

tarekgh commented Aug 31, 2022

@Jetski5822 #3897 (comment). In short, there is no workaround for that except if you build the resources on Windows and then copy it to Linux build. Also, the issue should be fixed in #7853. You may consider using the new SDK?

@Jetski5822
Copy link

@tarekgh Okay cool - so I flipped zh-CN to zh-Hans-CN; but to get them to compile, theres a difference between VS (im using latest stable) and Command line.

VS = the resource set wont get built, and when debugging the test, it also wont load.
dotnet build = Resource is built and placed in to bin folder.

Any ideas why this would be happening?, also - the latest SDK, is that NET7?

@tarekgh
Copy link
Member Author

tarekgh commented Sep 1, 2022

@Jetski5822 I guess, VS is using msbuild based on .NET Framework. @benvillalobos will know for sure.

My question now is, is it possible you can just generate the resources with the name zh-Hans? This should work everywhere and will work nicely with zh-CN and zh-Hans-CN at runtime.

@benvillalobos
Copy link
Member

@Jetski5822 yeah this behavior difference is unfortunately expected. The fix only works for net core scenarios, since that's where we get our fancy new API from.

baochenw added a commit to baochenw/ModernWpf that referenced this issue Sep 20, 2022
Problem:
Refer to dotnet/msbuild#3897
When building with dotnet cli, certain cultures cannot be utilized for resource localization, such as getting localized strings. This impacts both the build process (e.g., identifying and processing resource files) and the lookup of resources at runtime. For example: MyStrings.zh-TW.resx will not have a resource assembly created.

Solution:
This change renames all *.zh-CN.resx to *.zh-Hans.resx and *.zh-TW.resx to *.zh-Hant.resx.

Test Done:
Local build with dotnet cli and verified chinese assemblies are created.
@shunyh
Copy link

shunyh commented Jan 4, 2023

new CultureInfo("zh-TW");

How to use this code in project? We have lot of services use zh-CN/zh-TW, I don't want to rename all one by one, is there any better way to fix? Any insight is helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment