Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringComparer not working as expected on Linux #20599

Closed
ChristophB125 opened this issue Mar 13, 2017 · 9 comments
Closed

StringComparer not working as expected on Linux #20599

ChristophB125 opened this issue Mar 13, 2017 · 9 comments

Comments

@ChristophB125
Copy link

The following test works on Windows but not on Linux (RHEL 7.2 or Ubuntu 16.04):

Assert.AreEqual(0, StringComparer.CurrentCultureIgnoreCase.Compare("ss", "ß"));

Our Microsoft consultant Ben Gimblett has confirmed it and requested that we raise it as an issue here for follow-up.

We understand that .Net Core uses Windows's own comparison service, which is not available on Linux operating systems. How would one perform these culture aware operations on .Net core - irrespective of operating system?

@tarekgh
Copy link
Member

tarekgh commented Mar 13, 2017

this can be by design as we are using the OS for such features. I'll need to look first if there is anything else missing in our wrappers.

@migajek
Copy link

migajek commented Mar 20, 2018

I'm afraid the problem is wider.
I was struggling to find out why my app doesn't work on .net core on Ubuntu.
I've been digging through librariers, but was able to narrow it down to just this

ladies, gentlemen ... behold:
apparently on en-US-POSIX culture on my Ubuntu, "Id" != "id"

me@system:/tmp/netcoredemo$ more Program.cs
using System;

namespace netcoredemo
{
    class Program
    {
        static void Main(string[] args)
        {
            var s = "Id";
            Console.WriteLine($"current / ignore {s.Equals("id", StringComparison.CurrentCultureIgnoreCase)}");
            Console.WriteLine($"invariant / ignore {s.Equals("id", StringComparison.InvariantCultureIgnoreCase)}");
            Console.WriteLine($"culture: {System.Globalization.CultureInfo.CurrentCulture.Name}");
        }
    }
}
me@system:/tmp/netcoredemo$ dotnet run
current / ignore False <--- !!!!!!!!!!
invariant / ignore True
culture: en-US-POSIX

env details

me@system:/tmp/netcoredemo$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

me@system:/tmp/netcoredemo$ dotnet --version
2.1.4

@migajek
Copy link

migajek commented Mar 20, 2018

setting the culture at the beginning of the program fixes the issue

System.Globalization.CultureInfo.CurrentCulture = new System.Globalization.CultureInfo("en-US");

@tarekgh
Copy link
Member

tarekgh commented Mar 20, 2018

@ChristophB125 you can use the CompareOptions.IgnoreNonSpace to get comparison of "ß" evaluated as equal to "ss". CompareOptions can be passed to string APIs, CompareInfo APIs and also you can create StringComparer in 2.1 version using the following code:

        string sb = "ß";
        string ss = "ss";

        StringComparer sc1 = StringComparer.Create(CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);
        Console.WriteLine(sc1.Compare(sb, ss));

@migajek en_US_POSIX is very special locale and its string sorting behavior is special too. This is why you got this surprising result when you used it. switching to en-US as you did will give you the logical results.

@tarekgh
Copy link
Member

tarekgh commented Mar 20, 2018

I am closing this issue, feel free to send any more questions or issue. thanks all.

@tarekgh tarekgh closed this as completed Mar 20, 2018
@migajek
Copy link

migajek commented Mar 21, 2018

@tarekgh so the right question is .. why does .net core detect current culture as en-US-POSIX?
I cannot seem to find any information on where does current culture come from on Linux.

LANG env variable is en_US.UTF-8 ...

root@system:~/netcoredemo# dotnet run
current / ignore False
invariant / ignore True
culture: en-US-POSIX
root@system:~/netcoredemo# echo $LANG
en_US.UTF-8

@tarekgh
Copy link
Member

tarekgh commented Mar 21, 2018

@migajek

You may look at https://github.com/dotnet/coreclr/blob/master/src/corefx/System.Globalization.Native/locale.cpp#L123 for how we get the default user locale. in short, we call ICU function uloc_getDefault() to get the default locale.

@tarekgh
Copy link
Member

tarekgh commented Mar 21, 2018

you may check the values of the environment variables LC_MESSAGES and LC_ALL in your system.

@migajek
Copy link

migajek commented Mar 21, 2018

@tarekgh thank you, this seems to work as expected then as uloc_getDefault returns en_US_POSIX for me.
I'll look into system configuration

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants