Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcp47_language_tag doesn't fail on some non-BCP47 tags #1221

Open
2 tasks done
bfabio opened this issue Feb 8, 2024 · 1 comment
Open
2 tasks done

bcp47_language_tag doesn't fail on some non-BCP47 tags #1221

bfabio opened this issue Feb 8, 2024 · 1 comment

Comments

@bfabio
Copy link

bfabio commented Feb 8, 2024

  • I have looked at the documentation here first?
  • I have looked at the examples provided that may showcase my question here?

Package version eg. v9, v10:

v10

Issue, Question or Enhancement:

When using bcp47_language_tag for validation, some non-BCP47 tags such as "eng" or "en_US" are passing as valid.

‎isBCP47LanguageTag() uses golang.org/x/text/language's Parse and its documentation says:

[snip]
It accepts tags in the BCP 47 format and extensions to this standard defined in https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.

Code sample, to showcase or reproduce:

I expect both of these to fail, but they don't:

package main

import (
	"fmt"
	"github.com/go-playground/validator/v10"
)

func main() {
	validate := validator.New()

	err := validate.Var("en_US", "bcp47_language_tag")
	if err != nil {
		fmt.Println(err.Error())
		return
	}

	err = validate.Var("eng", "bcp47_language_tag")
	if err != nil {
		fmt.Println(err.Error())
		return
	}
}
@shihanng
Copy link
Contributor

shihanng commented Mar 6, 2024

I think golang.org/x/text/language's Parse is based on Unicode Locale Data Markup Language (LDML)'s Unicode Language and Locale Identifiers which is based on BCP47 (but they are not strictly the same). E.g., Unicode Language and Locale Identifiers allow the underscore _ to be used as a separator.

sep = [-_] ;

But not BCP47:

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

There is a section called BCP 47 Conformance which reads:

It allows certain syntax for backwards compatibility (not BCP 47-compatible):

  • The "_" character for field separator characters, as well as the "-" used in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants