Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] transliterating all-caps strings ends up with mixed case #675

Open
padde opened this issue Aug 30, 2023 · 3 comments
Open

[BUG] transliterating all-caps strings ends up with mixed case #675

padde opened this issue Aug 30, 2023 · 3 comments

Comments

@padde
Copy link

padde commented Aug 30, 2023

What I tried to do

I want to transliterate an all-caps string

I18n.transliterate("KANÜLE")

What I expected to happen

I expect all resulting characters to be capitalized

#=> "KANUELE"

What actually happened

The resulting characters are mixed case

#=> "KANUeLE"

Simply changing the entries in the translations file to "Ü": "UE" works for this case, but then of course mixed case words will be transliterated in a wrong manner:

I18n.transliterate("Überfall")
#=> "UEberfall"

I would expect a solution that can handle both cases gracefully.

Versions of i18n, rails, and anything else you think is necessary

All versions of i18n

@radar
Copy link
Collaborator

radar commented Sep 11, 2023

Apologies for the delay -- I've been away on leave.

When I run your code I see not quite your expected string, but at least all characters are uppercase:

[1] pry(main)> I18n.transliterate("KANÜLE")
=> "KANULE"
[2] pry(main)> RUBY_VERSION
=> "2.7.6"
[3] pry(main)> I18n::VERSION
=> "1.14.1"

You mention:

Simply changing the entries in the translations file to "Ü": "UE" works for this case,

Which translations file? You did not supply this in your original message.

Could you please supply the file that you're talking about here?

@padde
Copy link
Author

padde commented Sep 11, 2023

@radar my apologies, I am using i18n-rails which includes some transliteration rules for all kinds of languages. The main problem here is that some characters will end up being transliterated as two characters.

Here is a full working example for the first option that we currently have, storing capitalized versions of the transliterated characters, which is what rails-i18n does:

# frozen_string_literal: true

require 'i18n'

I18n.config.enforce_available_locales = false
I18n.locale = :de

# capitalized transliterations, work only for capitalized words
I18n.backend.store_translations(
  :de,
  i18n: {
    transliterate: {
      rule: {
        'ä' => 'ae',
        'é' => 'e',
        'ü' => 'ue',
        'ö' => 'oe',
        'Ä' => 'Ae',
        'Ü' => 'Ue',
        'Ö' => 'Oe',
        'ß' => 'ss',
        'ẞ' => 'SS'
      }
    }
  }
)

puts I18n.transliterate('KANÜLE') # => 'KANUeLE' (bad)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (good, ẞ is by definition only used for all caps)
puts I18n.transliterate('Überfall') # => 'Ueberfall' (good)

As mentioned before, switching to all-caps versions will not help because then we would break the cases where we actually want capitalized versions such as the last example:

# frozen_string_literal: true

require 'i18n'

I18n.config.enforce_available_locales = false
I18n.locale = :de

# all caps transliterations, work only for all caps words
I18n.backend.store_translations(
  :de,
  i18n: {
    transliterate: {
      rule: {
        'ä' => 'ae',
        'é' => 'e',
        'ü' => 'ue',
        'ö' => 'oe',
        'Ä' => 'AE', # all caps now
        'Ü' => 'UE', # all caps now
        'Ö' => 'OE', # all caps now
        'ß' => 'ss',
        'ẞ' => 'SS'
      }
    }
  }
)

puts I18n.transliterate('KANÜLE') # => 'KANUELE' (good)
puts I18n.transliterate('FUẞBALL') # => 'FUSSBALL' (still good)
puts I18n.transliterate('Überfall') # => 'UEberfall' (bad)

@tom-lord
Copy link
Contributor

tom-lord commented Mar 24, 2024

'Ü' => 'Ue'

'Ü' => 'UE'

I would expect a solution that can handle both cases gracefully.

My 2 cents on the topic as a passing observer...

Either of your configurations above will be sufficient for the majority of use cases, but they are only approximations. A comprehensive solution cannot be a straightforward "find and replace"; it would need to look at the surrounding context of words.

From the documentation, I18n transliterate rules can be given as a Proc. I don't know what a "perfect" solution for transliterating Ü in German looks like, but for example I found this (JavaScript) code that claims to work for a wider range of scenarios. (You might succeed in finding an even better solution and/or something already written in ruby.)


This library does not, currently, define or maintain transliteration rules across different locales. It simply supports flexible configuration options.
Therefore I disagree with the feedback you received in the rails-i18n project: Whilst they may want to keep their "simple" configuration unchanged as it solves the majority of use cases, I still would not consider the raised issue to be a bug in the I18n library, but rather, a configuration issue in your project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants