New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Unexpected behaviour: parse_numbers() doesn't handle space as a grouping symbol, however, parse_decimal() does #1061 #1062
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've committed and pushed your entire .venv
– please clean that up first.
Should I just push the file I changed or everything except the .venv? |
Your commit(s) should only have relevant changes. |
Ok, I will push the file with the fix only then. |
You still have 755 files in this pull request. You'll probably need to |
should i do all of this within the terminal in vscode where i was working? |
Maybe. I don't use Visual Studio Code much. Visual Studio Code might have other tools for backing out commits. |
It worked, I made it so i only pushed the numbers.py file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to err on the side of caution here, if there is the remotest chance this could cause mis-parsing of numbers (e.g. something that couldn't be parsed before being now accidentally parsed as something it's not intended to be parsed as).
To that end, maybe this function should also have a strict=True
parameter like parse_decimal
does, and then maybe the grouping character logic could be shared between the two in non-strict mode?
babel/numbers.py
Outdated
group_symbol = get_group_symbol(locale, numbering_system=numbering_system) | ||
|
||
# Replace non-breakable spaces with the group symbol | ||
cleaned_string = string.replace('\xa0', group_symbol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case (either a doctest or a real test in tests/
) that uses an \xa0
character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be a valid test case for that: >>> parse_number('1 099', locale='ru')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My human eyes can't tell if that space is a \xA0
. You could spell it out there, if it really is one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made some changes applying some of the logic from parse_decimal and it says its passing the test cases on my end
@@ -1001,6 +1001,7 @@ def __init__(self, message: str, suggestions: list[str] | None = None) -> None: | |||
def parse_number( | |||
string: str, | |||
locale: Locale | str | None = LC_NUMERIC, | |||
strict: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the new argument into the docstring. strict
mode should probably be the default here, and the new behavior (which allows more loose strings to be parsed) should be opt-in.
babel/numbers.py
Outdated
@@ -1010,6 +1011,8 @@ def parse_number( | |||
1099 | |||
>>> parse_number('1.099', locale='de_DE') | |||
1099 | |||
>>> parse_number('1 099', locale='ru') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent is to test a string that has a non-breaking space character, make it obvious:
>>> parse_number('1 099', locale='ru') | |
>>> parse_number('1\xa0099', locale='ru') |
babel/numbers.py
Outdated
decimal_symbol = get_decimal_symbol(locale, numbering_system=numbering_system) | ||
|
||
try: | ||
parsed = decimal.Decimal(string.replace(group_symbol, '') | ||
.replace(decimal_symbol, '.')) | ||
except decimal.InvalidOperation as exc: | ||
raise NumberFormatError(f"{string!r} is not a valid number") from exc | ||
|
||
proper = format_decimal(parsed, locale=locale, decimal_quantization=False, numbering_system=numbering_system) | ||
|
||
if string != proper and proper != _remove_trailing_zeros_after_decimal(string, decimal_symbol): | ||
try: | ||
parsed_alt = decimal.Decimal(string.replace(decimal_symbol, '') | ||
.replace(group_symbol, '.')) | ||
except decimal.InvalidOperation as exc: | ||
raise NumberFormatError( | ||
f"{string!r} is not a properly formatted decimal number. " | ||
f"Did you mean {proper!r}?", | ||
suggestions=[proper], | ||
) from exc | ||
else: | ||
proper_alt = format_decimal( | ||
parsed_alt, | ||
locale=locale, | ||
decimal_quantization=False, | ||
numbering_system=numbering_system, | ||
) | ||
if proper_alt == proper: | ||
raise NumberFormatError( | ||
f"{string!r} is not a properly formatted decimal number. " | ||
f"Did you mean {proper!r}?", | ||
suggestions=[proper], | ||
) | ||
else: | ||
raise NumberFormatError( | ||
f"{string!r} is not a properly formatted decimal number. " | ||
f"Did you mean {proper!r}? Or maybe {proper_alt!r}?", | ||
suggestions=[proper, proper_alt], | ||
) | ||
|
||
# Remove other spaces and replace the group symbol with an empty string | ||
cleaned_string = string.replace(' ', '').replace(group_symbol, '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format_number
is not meant to parse decimal numbers, only ever integers, anything decimal-related shouldn't be here at all.
If anything, we should check for the existence of a decimal separator in the input in strict mode, and fail if it is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made the changes in the latest commit
babel/numbers.py
Outdated
|
||
# If strict mode is enabled, handle irregular formatting | ||
if strict and group_symbol in string: | ||
# Your additional logic for strict mode here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a strange comment... 😁
This will require rebasing, and there are still unresolved comments. |
Cleaned the string to ensure that it still works if there is a non-breakable space, \xa0, in the number and separed the logic so that it is easeir to understand.