New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locale strings not expanded in get
format
#701
Comments
@mredaelli it seems that you forgot to include the time and timezone tokens in your format string. Parsing was improved and revamped in Arrow v0.15.0. This should work: venv ❯ python3
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import arrow
>>> arrow.__version__
'0.15.2'
>>> arrow.get("Montag, 9. September 2019, 16:15-20:00", "dddd, D. MMMM YYYY, HH:mmZZ", locale="de")
<Arrow [2019-09-09T16:15:00-20:00]> |
Thank you @jadchaar , but I'm not sure I understand. What was the change exactly? Because in the documentation I still read
and that's exactly what I'm doing. I don't want to parse the time, that's all. I really hope the "search" part was not removed intentionally (and without a clear indication in the changelog, as far as I can see), because I rely on that for a lot of strings coming from websites of all types :) Besides, the ticket that prompted this, #612, talks about "partial matches", but in my case the match is complete, and so there should be no ambiguity, right? (Small thing: I liked very much the error message earlier, which made debugging easier by printing the expanded regex.) |
Hmm, I see that
seems to work, so it would seem that indeed you do not |
@mredaelli I think the comma in your example is causing the problem, without it everything seems to be working ok. >>> arrow.get("Montag, 9. September 2019 16:15-20:00", "dddd, D. MMMM YYYY", locale="de")
<Arrow [2019-09-09T00:00:00+00:00]> Further investigation needed. |
Ah apologies, I did not realize that you were trying to search--I thought you were trying to parse it. |
Do you mind if I try to fix this? |
@andrewchouman for sure! The issue is most likely in |
I believe I found a solution to the issue, but this seems to be the correct defined behavior based on test cases in versions >= 0.15.0. In current test cases, any substring connected to the string (not separated by whitespace) that isn't part of the pattern should trigger an exception. See below:
Before, 0.15.0, those test cases didn't exist and defined behavior was that any connected substring that wasn't part of the pattern was fine, as long as the pattern was there. I can make a pull request that reverts back just that little bit of functionality and fix the test cases so that the "invalid strings" are defined as valid, or the docs should be changed to reflect the differences. |
The issue seems to stem from this line of code where we add the word boundary. Currently, we wrap the formatting regex in a custom white space word boundary. Here is a demonstration of the regex that is formed: https://regex101.com/r/EJ2oJJ/2. If we remove the custom word boundary we search for the entry successfully: https://regex101.com/r/5bUeAq/1. We need to adjust the word boundary slightly to allow for partial match cases like the one in the original post. |
@jadchaar Do we want to allow punctuation perhaps (i.e. comma, period, dash)? I can work on this and add to the test cases. |
Yeah I think adding appropriate punctuation that may be possible in a datetime string to the regex word boundary could be an appropriate solution to this. Feel free to play around with regex101 to help fix this issue, it is an amazing tool for visualizing regular expressions. |
I should have it done within the next week or two (adding to documentation/testing included) |
@jadchaar I think this regex works for our purposes; what do you think? Also when I ran nosetests this test didn't work:
This is because this line causes a ParserMatchError now as opposed to a ParserError due to my regex change. ParserMatchErrors get caught within the function so the exception is never raised to this top level stack even if I were to switch the exception name in that test case. Should I remove the test case? |
@andrewchouman it may be better to supplement the current regex: |
This code:
works perfectly in
arrow
1.13.1, but in 1.15.2 I ge anException
t:where the pattern is not expanded.
Am I doing something wrong?
The text was updated successfully, but these errors were encountered: