Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YY maps 2018 to 20 #560

Closed
adam-rudd-myob opened this issue Jan 18, 2019 · 6 comments · Fixed by #655
Closed

YY maps 2018 to 20 #560

adam-rudd-myob opened this issue Jan 18, 2019 · 6 comments · Fixed by #655

Comments

@adam-rudd-myob
Copy link

Hi there,
I'm trying to standardise a bunch of records which can contain the patterns dd/mm/yyyy, d/mm/yy etc.

When using the example input of "15/01/2019"

print arrow.get("15/01/2019", ["D/M/YY","D/M/YYYY"]).format('YYYY-MM-DD')
outputs
2020-01-15, which seems to use "D/M/YY".
I would expect it to be ignored, since it doesn't fit with the pattern of the data provided

Currently I'm getting around this by swapping around the array, but i'm not feeling great about the fragility here.
print arrow.get("15/01/19", ["D/M/YYYY","D/M/YY"]).format('YYYY-MM-DD')

Suggestions:
1.
If YY is specified, it looks for YY:YYYY (\/[0-9]{2,4}). If the pattern extends without a delimiter up to 4 characters, it's treated as valid

If I could supply ,"D/M/YY$" I'd have a lot more confidence that the string will be correctly processed, regardless of where the pattern exists in the list.

If YY is specified, it strictly looks for YY. If the pattern extends without a delimiter, it's treated as invalid and the pattern is skipped.

Please let me know if you need any more information here!
Cheers,
Adam

@systemcatch
Copy link
Collaborator

Hi @adam-rudd-myob thanks for pointing this out. I can reproduce the problem (side note you could use DD/MM instead of D/M).

>>> import arrow
>>> arrow.__version__
'0.13.0'
>>> arrow.get("15/01/2019", ["D/M/YY","D/M/YYYY"])
<Arrow [2020-01-15T00:00:00+00:00]>
>>> arrow.get("15/01/2019", ["DD/MM/YY","DD/MM/YYYY"])
<Arrow [2020-01-15T00:00:00+00:00]>
>>> arrow.get("15/01/2019", "DD/MM/YYYY")
<Arrow [2019-01-15T00:00:00+00:00]>
>>> arrow.get("15/01/2019", "DD/MM/YY")
<Arrow [2020-01-15T00:00:00+00:00]>

I think this is caused by https://github.com/crsmithdev/arrow/blob/master/arrow/parser.py#L36, which matches if it finds 2 digits but ignores anything further. That regex is used for quite a lot of the other tokens which was initially worrying but looking further I don't think we will face problems apart from 'DDD' and 'DDDD' which are not implemented yet.

The way to fix this is probably to be strict and make sure that YY fails if any more than 2 digits are found.

@systemcatch
Copy link
Collaborator

Quick follow up I was messing about with your suggestion number 2.

>>> arrow.get("15/01/2019", ["D/M/YY","D/M/YYYY"])
<Arrow [2020-01-15T00:00:00+00:00]>
>>> arrow.get("15/01/2019", ["D/M/YY$","D/M/YYYY"])
<Arrow [2019-01-15T00:00:00+00:00]>
>>> arrow.get("15/01/2019", ["D/M/YY$","D/M/YYYY"]).format('YYYY-MM-DD')
'2019-01-15'

It seems to already work!?

@adam-rudd-myob
Copy link
Author

adam-rudd-myob commented Feb 15, 2019 via email

@systemcatch
Copy link
Collaborator

Hey Chris, thanks for pointing that out! I thought I’d checked that in the initial bug report but after trying the use cases we’ve been working with here, the $ seems to solve them all. So is that just a small amendment to the docs then? Cheers, Adam From: Chris notifications@github.com Reply-To: crsmithdev/arrow reply@reply.github.com Date: Friday, 15 February 2019 at 10:24 am To: crsmithdev/arrow arrow@noreply.github.com Cc: Adam Rudd Adam.Rudd@myob.com, Mention mention@noreply.github.com Subject: Re: [crsmithdev/arrow] YY maps 2018 to 20 (#560) Quick follow up I was messing about with your suggestion number 2.

arrow.get("15/01/2019", ["D/M/YY","D/M/YYYY"])
<Arrow [2020-01-15T00:00:00+00:00]>
arrow.get("15/01/2019", ["D/M/YY$","D/M/YYYY"])
<Arrow [2019-01-15T00:00:00+00:00]>
arrow.get("15/01/2019", ["D/M/YY$","D/M/YYYY"]).format('YYYY-MM-DD')
'2019-01-15' It seems to already work!? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#560 (comment)>, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXgIFDoadVhLAa-_PaxsfN3lU_1CR2X3ks5vNfAhgaJpZM4aHHzY.

________________________________ This email and any attachment are confidential. If you are not the intended recipient, please notify MYOB by reply email and delete this email. Please note that you must not access or use this email or any information in it. MYOB accepts no liability for viruses in this email or in any attachment to it.

Yeah I guess an update to the docs is order. To be honest I'm kinda surprised it works, probably to do with special regex characters now being handled better.

@algorythmik
Copy link

algorythmik commented Apr 2, 2019

Hey, I don't think that it works. I would still consider it a bug, the expected output for arrow.get('12 October 2018', 'DD MMMM YY') is to get a ParseError exception, whereas the result is:
<Arrow [2020-10-12T00:00:00+00:00]>
Also if you add the $ at the end it is not fixed. You get a ParseError exception, but you get the same here:
arrow.get('12 October 20', 'DD MMMM YY$')
Also, I expect arrow.get('01 January 123456789101112', 'DD MMMM YYYY') not to parse, but I get:
<Arrow [1234-01-01T00:00:00+00:00]>.

@systemcatch
Copy link
Collaborator

@algorythmik yup the parsing still needs serious improvement and cleanup to stop errors like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants