Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale __init__/parse inconsistencies #283

Open
mpkuth opened this issue Nov 19, 2015 · 5 comments · May be fixed by #309
Open

Locale __init__/parse inconsistencies #283

mpkuth opened this issue Nov 19, 2015 · 5 comments · May be fixed by #309
Assignees
Milestone

Comments

@mpkuth
Copy link

mpkuth commented Nov 19, 2015

Babel 2.1.1, Python 2.7.5


Locale.parse('en_US') returns Locale('en', territory='US')
but
Locale('en_US') returns Locale('en_US')
when I would expect
Locale('en_US') returns Locale('en', territory='US') or UnknownLocaleError


Locale.parse(12) intentionally raises TypeError: Unxpected value for identifier: 12
but
Locale(12) causes an uncaught TypeError

babel/core.pyc, line 150, __init__
    identifier = str(self)
babel/core.pyc, line 339, __str__
    return get_locale_identifier((self.language, self.territory, self.script, self.variant))
babel/core.pyc, line 943, get_locale_identifier
    return sep.join(filter(None, (lang, script, territory, variant)))
TypeError: sequence item 0: expected string, int found
@akx
Copy link
Member

akx commented Dec 30, 2015

Well, Locale() is meant to construct a locale exactly as you tell it to, but I do see your point.

I see two approaches:

  • Make the Locale constructor a "do what I mean" API, and have it use the same logic as Locale.parse() if the given locale is not .isalpha().
  • Error out in the constructor if the locale is not alphabetical.

@mpkuth
Copy link
Author

mpkuth commented Dec 30, 2015

IMO, keeping Locale.parse(identifier) and Locale(language, territory=None, script=None, variant=None) separate is a good thing. I believe that the core of the issue when using the constructor is the validation logic that currently reads as follows.

identifier = str(self)
if not localedata.exists(identifier):
    raise UnknownLocaleError(identifier)

The problem is that str(self) returns the same thing for both Locale('en_US') and Locale('en', territory='US') objects. The first is not valid because en_US is not a valid language but this check doesn't catch that and you get a whole locale identifier stored in the language attribute.

Each given part of the locale should probably be validated separately. Something like...

try:
    language = get_global('language_aliases')[language]
except KeyError:
    raise UnknownLocaleError('Unknown language ' + language)

if territory is not None:
    try:
        territory = get_global('territory_aliases')[territory][0]
    except KeyError:
        raise UnknownLocaleError('Unknown territory ' + territory)

if script is not None:    
    try:
        script = get_global('script_aliases')[script]
    except KeyError:
        raise UnknownLocaleError('Unknown script ' + script)

if variant is not None:
    try:
        variant = get_global('variant_aliases')[variant]
    except KeyError:
        raise UnknownLocaleError('Unknown variant ' + variant)

This should also have the side effect of fixing the "I passed you a number" uncaught exception in the validator that isn't working as expected.

@akx
Copy link
Member

akx commented Dec 30, 2015

The alias tables don't contain every language/territory/script/variant there is, though.

Maybe if we start by validating the pieces to be alphabetical if they exist, and then build up to more precise validation later?

@mpkuth
Copy link
Author

mpkuth commented Dec 30, 2015

Forgive me as I haven't dug too deeply into the code, but if those tables aren't complete and you can get a Locale object with a language/territory/script/variant that isn't there then how is it useful? i.e. How does it know what to return for formatting and other functions?

In any case, if you can't look it up directly then ensuring that each piece is only alpha or only alpha/num and of the correct length (if applicable) as you suggest would be a good starting point for better validation of constructor arguments.

akx added a commit to akx/babel that referenced this issue Dec 30, 2015
@akx
Copy link
Member

akx commented Dec 30, 2015

Those are alias tables, such as

    "variant_aliases": {
        "HEPLOC": "ALALC97", 
        "AALAND": "AX", 
        "POLYTONI": "POLYTON"
    }

that map previously/occasionally used names into the canonical names used by the CLDR.

The actual knowledge of which locales are available stems from the existence of a .dat file for the particular locale identifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants