Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No error raised on invalid dates #527

Closed

Conversation

akatcha
Copy link

@akatcha akatcha commented Apr 9, 2018

#519

Implemented a fix that throws a parser error when a year is incorrectly placed in a date. This fix applies in situations where a string is entered for a date.

Before change behavior:

>>> arrow.get('02-03-2018')                                            
<Arrow [2018-01-01T00:00:00+00:00]>

After change behavior:

>>> arrow.get('02-03-2018')                                            
Traceback (most recent call last):                                       
File "<stdin>", line 1, in <module>                                    
File "/mnt/c/Users/Andrew/Documents/College_Work/eecs481/hw6/arrow/arrow/api.py", line 22, in get                                               
return _factory.get(*args, **kwargs)                                 
File "/mnt/c/Users/Andrew/Documents/College_Work/eecs481/hw6/arrow/arrow/factory.py", line 174, in get                                          
dt = parser.DateTimeParser(locale).parse_iso(arg)                    
File "/mnt/c/Users/Andrew/Documents/College_Work/eecs481/hw6/arrow/arrow/parser.py", line 119, in parse_iso                                     
return self._parse_multiformat(string, formats)                      
File "/mnt/c/Users/Andrew/Documents/College_Work/eecs481/hw6/arrow/arrow/parser.py", line 303, in _parse_multiformat                            
raise ParserError('Could not match input to any of {0} on \'{1}\''.format(formats, string))                                               
arrow.parser.ParserError: Could not match input to any of [u'YYYY-MM-DD', u'YYYY/MM/DD', u'YYYY.MM.DD', u'YYYY-MM', u'YYYY/MM', u'YYYY.MM', u'YYYY', u'YYYY', u'YYYY'] on '02-03-2018'

@codecov-io
Copy link

codecov-io commented Apr 9, 2018

Codecov Report

Merging #527 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #527   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          14     14           
  Lines        3385   3405   +20     
  Branches      235    238    +3     
=====================================
+ Hits         3385   3405   +20
Impacted Files Coverage Δ
tests/parser_tests.py 100% <100%> (ø) ⬆️
arrow/parser.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c67fb5c...2decd5c. Read the comment docs.

@pganssle
Copy link
Contributor

pganssle commented Apr 9, 2018

I would recommend being very careful with this, possibly adding a deprecation period before this starts raising an error. It's certainly not good to silently fail, but there's a good chance that people have come to rely on this behavior.

@ryantuck
Copy link

ryantuck commented Apr 9, 2018

I had opened the original issue #519 due to silent failures when passing in dates that don't exist when not ingested in ISO format (e.g. 02/30/2018). Somehow that was being parsed correctly as the year 2018, but the day/month would silently fall back to 1/1. Funny enough, this silently fails when attempting to parse 'normal' american-formatted dates!

>>> arrow.get('01/03/2018')
<Arrow [2018-01-01T00:00:00+00:00]>
>>> arrow.get('02/30/2018')
<Arrow [2018-01-01T00:00:00+00:00]>

I would be grateful for a library if, on my next deployment, something started actually properly throwing errors instead of continuing to silently fail in this weird way.

I'm still confused as to why YYYY/MM/DD is considered a valid date format, but this is 👍 by me.

@pganssle
Copy link
Contributor

pganssle commented Apr 9, 2018

Why wouldn't YYYY/MM/DD be a valid date format?

It's quite unambiguous if someone writes:

2014/01/30

Also, I strongly recommend starting out raising a warning rather than raising an error. I've already had people looking to opt in to the "fail silently" behavior on dateutil/dateutil#540. Start with a warning of a distinct class (so it can be easily filtered out or converted to an error as desired) and give it a bit before making a release with the error behavior on by default.

@ryantuck
Copy link

ryantuck commented Apr 9, 2018

My comment was out of a lack of ever seeing that format out in the wild, and given that the acceptable list appears to be ISO 8601 formats plus the YYYY/MM/DD format, why bother having anything other than just ISO as acceptable inputs?

You definitely appear to have more experience with date libraries and use cases though, so I would certainly defer to your expertise / experience on this :)

@akatcha
Copy link
Author

akatcha commented Apr 10, 2018

@pganssle
Before changing the code, I want to make sure that we are on the same page.

Instead of throwing an error on an incorrect date, you would prefer to provide a warning. In addition to giving a warning, you want the warning to be wrapped in a class so that when we decide to switch to throwing an error, it is a quick fix.

A couple questions:
Is there a message that you suggest I add to the warning? An example would be really helpful.

Is there another open source example that is a good model for how to write a class to provide a warning and later throw an error?

Do you have other suggestions or critiques on my implementation of this bug (other than giving a warning rather than an error)?

Thank you both for the quick feedback.

@pganssle
Copy link
Contributor

@Andrew-Katcha You can see how we do a similar thing in dateutil here.

The reason you want to use a custom Warning class is because the warnings module allows people to filter out warnings by class really easily. Anyone who knows about the problem and doesn't care can simply set the filter for that Warning class to "ignore" and they won't see the warning anymore. Anyone who knows about the problem and wants to opt-in to getting an error can sett he Warning class to "error" and an error will be thrown any time the warning is raised.

The custom warning class has nothing to do with making it easier for arrow to change it over to an error, that's just so that it's easier for downstream consumers to deal with that warning and only that warning.

The reason to have a warning instead of an error at all is because right now you only see the people complaining because they want datetime strings validated in some way because they are the ones with the bugs. You don't see the people counting on the current behavior right now because for them the module still works, but if you start throwing an error they'll start complaining. The best thing to do is to throw a warning so that they are on notice that this behavior might change and they can give feedback on the implementation of any validation scheme.

@pganssle
Copy link
Contributor

pganssle commented Apr 10, 2018

Regarding the implementation, I don't know arrow's codebase very well, but I don't love it. It seems like you aren't actually fixing the bug in parse, just working around it for some specific cases. Consider that parse seems to be erroneously matching a regular expression here. If that weren't happening, this would work as intended and correctly throw an error.

Edit: I think you actually "fixed" this problem by failing before it gets a chance to try all the formats in the list. I think it's actually never necessary to set _datetime to None, because _datetime should always be None if all the formats in parse_multiformat failed to parse the datetime. What has made this start failing on invalid dates is actually the break, not the _datetime = None (which is redundant).

@akatcha
Copy link
Author

akatcha commented Sep 9, 2018

@pganssle
You were correct, I did not need to set _datetime = none.
Do we want to merge this?

@systemcatch
Copy link
Collaborator

Hi @akatcha, sorry this wasn't ever looked at properly. We're currently working on a 0.15.0 release that will solve most of these invalid date bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants