Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accurate handling of parsing errors #91

Closed
vrutsky opened this issue Jan 28, 2014 · 11 comments · Fixed by #655
Closed

Accurate handling of parsing errors #91

vrutsky opened this issue Jan 28, 2014 · 11 comments · Fixed by #655

Comments

@vrutsky
Copy link

vrutsky commented Jan 28, 2014

Arrow can silently fail to parse complete date string and return invalid result when date string partially matches one of formats.

Consider parsing SQLite date-time string:

>>> arrow.get('2014-01-25 01:22:58')
<Arrow [2014-01-25T00:00:00+00:00]>

Note that time part of date not parsed and were silently lost.

Or parsing even not date-time sting:

>>> arrow.get("Happy 2014!")
<Arrow [2014-01-01T00:00:00+00:00]>
@sochoa
Copy link

sochoa commented Feb 8, 2014

The challenge with this issue is that, based on how the parser works, its supposed to behave this way. It found a year, and returned a date/time that has the right year and everything else zeroed out. The issue here is that the input needs to be sanitized, and that (IMO) would be the job of the application prior to calling arrow.get().

I would suggest closing this issue as a non-issue.

@rutsky
Copy link
Contributor

rutsky commented Feb 10, 2014

I don't think its supposed to behave this way.

Documentation for arrow says that in current context it tries to parse ISO-8601-formatted str. Method for parsing in source code is called parse_iso.

momentjs (on which arrow interface is based) allows to parse in corresponding case browser dependent date string or ISO-8601 date string. If momentjs fails to parse date it returns special "invalid date" value.

As I see ISO-8601 describe wide range of date strings formats that can be unambiguously interpreted. Here is the list of supported ISO-8601 formats in momentjs:

"2013-02-08"
"2013-02-08T09"
"2013-02-08 09"
"2013-02-08T09:30"
"2013-02-08 09:30"
"2013-02-08T09:30:26"
"2013-02-08 09:30:26"
"2013-02-08T09:30:26.123"
"2013-02-08 09:30:26.123"
"2013-02-08T09:30:26 Z"
"2013-02-08 09:30:26 Z"
"2013-W06-5"
"2013-W06-5T09"
"2013-W06-5 09"
"2013-W06-5T09:30"
"2013-W06-5 09:30"
"2013-W06-5T09:30:26"
"2013-W06-5 09:30:26"
"2013-W06-5T09:30:26.123"
"2013-W06-5 09:30:26.123"
"2013-W06-5T09:30:26 Z"
"2013-W06-5 09:30:26 Z"
"2013-039"
"2013-039T09"
"2013-039 09"
"2013-039T09:30"
"2013-039 09:30"
"2013-039T09:30:26"
"2013-039 09:30:26"
"2013-039T09:30:26.123"
"2013-039 09:30:26.123"
"2013-039T09:30:26 Z"
"2013-039 09:30:26 Z"

In my opinion arrow.get should work as momentjs's moment() function and strictly parse ISO-8601-formatted date (in addition to parsing timestamp, tzinfo and other quite strict formats that it supports now of course).

@honzajavorek
Copy link

There could be a way how to switch the parser to a strict mode, if one needs it. Sometimes it's useful to fail fast and be strict about inputs. Also, ParserError should rather be a subclass of ValueError I think.

@keynmol
Copy link

keynmol commented May 19, 2015

Hi, is there any movement on this? Or a workaround? We are parsing OG tags on pages and some of them have some really malformed shit there. And Arrow is not helping:

In [52]: arrow_get('blabla102015').isoformat()
Out[52]: '1020-01-01T00:00:00+00:00'

@jacobsvante
Copy link

Really annoying that it parses so relaxedly. Almost worse than php date parsing.

@laruellef
Copy link

Ditto,
I would support bombing on incomplete parsing such as:
arrow.get("02/01/2004")
Arrow [2004-01-01T00:00:00+00:00]

I deal with various data sources and date formats on a daily basis and arrow has been very valuable in handling parsing automagically.
However, after months of use, I was unaware that the use case above wasn't working until now.
This has resulted in service affecting problems...

Would also be useful to document the complete list of supported formats,
is that avail anywhere?

@andrewelkins
Copy link
Contributor

@laruellef I don't think it's documented anywhere.

@rutsky I agree, but inorder to not break backwards compatibility it would need to be a flag to be set.

@andrewelkins
Copy link
Contributor

Related to #292, #267 and #399

I'd like to handle two situations which might be fixed with the same code:

  • 10/10/2016 #Non-iso format which currently returns 2016-01-01 - At minimum should return an error
  • 2016-1-10 #Non-padded month which currently returns 2016-01-01 - At minimum should return an error

@Marco-Sulla
Copy link

Marco-Sulla commented Jan 1, 2017

I don't think JS and JS libraries are good example of robust design in general. moment.js it's a good library, but Python good practices enforces the use of exceptions (or at least to return a None, if you want a C-style speedup in your code)

@andrewelkins
Copy link
Contributor

@MarcoSulla Agreed, it's just a basis, but doesn't mean Arrow has to implement it verbatim.

@systemcatch
Copy link
Collaborator

The current situation for all these examples.

>>> arrow.get('2014-01-25 01:22:58')
<Arrow [2014-01-25T01:22:58+00:00]>
>>> arrow.get('blabla102015').isoformat()
'1020-01-01T00:00:00+00:00'
>>> arrow.get("Happy 2014!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/chris/arrow/arrow/api.py", line 22, in get
    return _factory.get(*args, **kwargs)
  File "/home/chris/arrow/arrow/factory.py", line 174, in get
    dt = parser.DateTimeParser(locale).parse_iso(arg)
  File "/home/chris/arrow/arrow/parser.py", line 119, in parse_iso
    return self._parse_multiformat(string, formats)
  File "/home/chris/arrow/arrow/parser.py", line 286, in _parse_multiformat
    raise ParserError('Could not match input to any of {} on \'{}\''.format(formats, string))
arrow.parser.ParserError: Could not match input to any of ['YYYY-MM-DD HH:mm'] on 'Happy 2014!'
>>> arrow.get("02/01/2004")
<Arrow [2004-01-01T00:00:00+00:00]>
>>> arrow.get("10/10/2016")
<Arrow [2016-01-01T00:00:00+00:00]>
>>> arrow.get("2016-1-10")
<Arrow [2016-01-01T00:00:00+00:00]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.