Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.15.0 Changes❗ #612

Closed
jadchaar opened this issue Jul 14, 2019 · 52 comments
Closed

v0.15.0 Changes❗ #612

jadchaar opened this issue Jul 14, 2019 · 52 comments
Labels

Comments

@jadchaar
Copy link
Member

jadchaar commented Jul 14, 2019

In the upcoming version 0.15.0 of arrow, we will be making a lot of changes to the behavior of arrow.get() to address a number of reported parsing bugs. We have outlined the changes below:

Fixes

  • Most instances of arrow.get() returning an incorrect arrow object from a partial parsing match have been eliminated.

For example,

>>> arrow.get("garbage2017everywhere")
<Arrow [2017-01-01T00:00:00+00:00]>
>>> arrow.get('Jun-2019', ['MMM-YY', 'MMM-YYYY'])
<Arrow [2020-06-01T00:00:00+00:00]>

These will raise a ParserError in 0.15.0.

  • When a meridian token (a|A) is passed and no meridians are available for the specified locale (e.g. unsupported or untranslated), a ParserError is raised.
  • Timestamp strings are no longer supported in the arrow.get() method without a format string: arrow.get("1565358758"). This change was made to support the ISO 8601 basic format and to address bugs such as Odd parsing from list of formats #447.

The following will still work as expected:

arrow.get("1565358758", "X")
arrow.get("1565358758.123413", "X")
arrow.get(1565358758)
arrow.get(1565358758.123413)
  • The timestamp token (X) will now matches float timestamps: arrow.get("1565358758.123415", "X").
  • The timestamp token (X) will now only match on strings that strictly contain integers and floats, preventing incorrect matches.

New Features

  • ISO-8601 basic format style is now supported (e.g. YYYYMMDDThhmmssZ)
  • Added support for DDD and DDDD ordinal date tokens (e.g. "1998-045")

Issues addressed

Development progress

You can view the progress of these changes here: https://github.com/crsmithdev/arrow/tree/Version-0.15.0.

Disable Warnings

To get rid of the ArrowParseWarning messages in 0.14.3 onwards, do the following:

import warnings
from arrow.factory import ArrowParseWarning

warnings.simplefilter("ignore", ArrowParseWarning)
@kornpow
Copy link

kornpow commented Jul 29, 2019

I still get warnings when I pass a parser string to .get()
Also here is a full example for disabling warnings if anyone needs:
Using arrow 0.14.3

import arrow

import warnings

from arrow.factory import ArrowParseWarning

astring = "2019-07-29T13:58:44.460381Z"

# Show warning still?

arrow.get(astring,"YYYY-MM-DD[T]HH:mm:ss.S[Z]")

# Filter out the warnings

warnings.simplefilter("ignore", ArrowParseWarning)

# No warnings anymore

arrow.get(astring,"YYYY-MM-DD[T]HH:mm:ss.S[Z]")

@jadchaar
Copy link
Member Author

jadchaar commented Jul 29, 2019

Hey @sako0938, we made the conscious decision to include the warning on the arrow.get() method with and without a format string since significant changes are coming to both.

@systemcatch we could maybe better target the format string warnings to the timestamp token X?

@systemcatch
Copy link
Collaborator

@jadchaar @sako0938 that's going to be difficult since many of the changes apply with or without a format string, I think it's easier to just give a general warning despite it being more annoying.

For example in current arrow;

>>> arrow.get("garbage2017everywhere", "YYYY")
<Arrow [2017-01-01T00:00:00+00:00]>
>>> arrow.get("garbage2017everywhere")
<Arrow [2017-01-01T00:00:00+00:00]>

arrow 0.15.0 will raise errors for both these cases.

@kornpow
Copy link

kornpow commented Jul 30, 2019

Sounds great. I mostly wanted to show how to remove the warnings for others since the issue didn't provide all the context needed at the time.

@davix
Copy link

davix commented Aug 5, 2019

Hi, I'm a first-time user of arrow with latest version installed (0.14.4). Still can't understand from previous answers that why the example below still has warning. Is something still missing?

arrow.get(astring,"YYYY-MM-DD[T]HH:mm:ss.S[Z]")
 /Users/whuang/Envs/p2/lib/python2.7/site-packages/arrow/factory.py:249: ArrowParseWarning: The .get() parsing method with a format string will parse more strictly in version 0.15.0.See https://github.com/crsmithdev/arrow/issues/612 for more details.

@systemcatch
Copy link
Collaborator

Hi, I'm a first-time user of arrow with latest version installed (0.14.4). Still can't understand from previous answers that why the example below still has warning. Is something still missing?

arrow.get(astring,"YYYY-MM-DD[T]HH:mm:ss.S[Z]")
/Users/whuang/Envs/p2/lib/python2.7/site-packages/arrow/factory.py:249: ArrowParseWarning: The .get() parsing method with a format string will parse more strictly in version 0.15.0.See #612 for more details.

Hey @davix, on version 0.14.3+ there will always be a warning generated when arrow.get() does str parsing (with and without a format passed). To disable the warning please use @sako0938 or our solutions.

@davix
Copy link

davix commented Aug 7, 2019

on version 0.14.3+ there will always be a warning generated when arrow.get() does str parsing (with and without a format passed)

From my understanding, a warning means that a user is doing it incorrectly or at least imperfectly. Why is there warning even if a format is provided, I'm curious? or is there a more proper command other that .get()?

@systemcatch
Copy link
Collaborator

From my understanding, a warning means that a user is doing it incorrectly or at least imperfectly. Why is there warning even if a format is provided, I'm curious? or is there a more proper command other that .get()?

Warnings can be also be used to notify users of upcoming changes to the package. The changes we've made to the .get() method can apply both with and without a format provided. In 0.15.0 these improvements will take effect and the warnings will disappear.

@joinemm
Copy link

joinemm commented Aug 7, 2019

The flexible .get() was in my opinion one of the best features of arrow, as it allowed for parsing of unknown or user generated time strings without any extra logic.
I wonder if it's possible you could keep the old functionality of .get() in a new separate function? Maybe like .fuzzy_get() or something.

@systemcatch
Copy link
Collaborator

@joinemm that functionality will still be there, indeed in version 0.15.0 more formats will parse in .get() without any other logic being needed than do currently.

@petermolnar
Copy link

petermolnar commented Aug 9, 2019

I started seeing the message today; I'm with @joinemm - .get is one of the main reasons for me to use arrow. I'm very surprised by this move.

EDIT
can at least rfc3339 be included in the default string parsing list without showing that warning, please?

@joinemm
Copy link

joinemm commented Aug 9, 2019

@systemcatch really? But what about

arrow.get() will no longer parse natural language strings

Timestamp strings are no longer supported in the arrow.get() method

I feel like these two were very important

@petermolnar
Copy link

Timestamp strings are no longer supported in the arrow.get() method

That is the main function why I'm using arrow; parsing rfc3339 and ISO-8601 without needing to deal with TZ mockery in Python.

I'd settle with a .get_iso8601 method that doesn't need any formatting input, but deals with any variations of ISO-8601.

@jadchaar
Copy link
Member Author

jadchaar commented Aug 9, 2019

Timestamp strings are no longer supported in the arrow.get() method

Apologies, this statement needs to be tweaked.

We are dropping support for usage like: arrow.get("1565358758") because we are adding support for the basic format in ISO 8601. This would make timestamp strings and the basic format ambiguous.

You can still use get as follows: arrow.get(1565358758) and if you need to parse a timestamp string, arrow.get("1565358758", "X").

Let us know if you all still have concerns and feedback about this. I have tweaked the original message to reflect my comments above.

@joinemm
Copy link

joinemm commented Aug 9, 2019

@jadchaar Oh ok, looks like it was just badly worded then. So something like arrow.get("2002-10-02T10:00:00-05:00") will still work as it used to?

@jadchaar
Copy link
Member Author

jadchaar commented Aug 9, 2019

@joinemm indeed :). We are not trying to break existing workflows, we are just trying to address core parsing issues that have given arrow a bad name over the years.

arrow.get() will no longer parse natural language strings

As for this, the functionality will still remain intact if you provide a format string like this: arrow.get("Meet me at 2016-05-16T04:05:06.789120 at the stadium", "YYYY-MM-DDTHH:mm:ss.S" ).

This will no longer work though: arrow.get("Meet me at 2016-05-16T04:05:06.789120 at the stadium"). We made this decision because we wanted get() without a format string to simply be for parsing date and time strings by themselves so that we can prevent any unintentional parsing issues (e.g. accepting blah2016).

@systemcatch
Copy link
Collaborator

can at least rfc3339 be included in the default string parsing list without showing that warning, please?

@petermolnar the warning is there so people can see the full list of changes in this issue, thanks to that warning you and @joinemm showed us that the change log was poorly worded which should hopefully mean less confusion for others in the future.

The warning will be gone in 0.15.0 which should be released around the end of August.

@KenKundert
Copy link

The current version of the manual on readthedocs.io says:

Some ISO-8601 compliant strings are recognized and parsed without a format string:
>>> arrow.get('2013-09-30T15:34:00.000-07:00')
<Arrow [2013-09-30T15:34:00-07:00]>

However, that code now generates a warning. That forces an ugly choice. Do I put in the warning suppression code, which just adds clutter to my code and adds risk that some new future warning will be unintentionally suppressed, or do I specify the format string for iso8601, which clutters up my code, seems error prone, and also seems like domain specific knowledge that should be built in to Arrow.

I have instead decided to just forbid the use of versions 0.14.* by adding the following to my setup.py file:
install_requires = ['arrow!=0.14.*']
That eliminates the warning message without requiring me to add clutter to my code.

Hopefully things are better in 0.15.*.

@merriam
Copy link

merriam commented Aug 14, 2019

This issue has come up a number of times in different date/time libraries; I last ran into it in MomentJS. The progression is surprisingly consistent:

  1. A native library exists, e.g., DateTime. It is found wanting, particularly in parsing dates.
  2. A new library is written, e.g., Arrow, that provides a liberal, intuitive parsing syntax.
  3. The library gains popularity and a large number of additional features. It is used in production.
  4. There are users that want to use the library in production and find a number of times where liberal parsing is too liberal, not raising errors. These users are now maintainers or supporters.
  5. A change is proposed to make strict parsing the default. This is introduced as a breaking change, as the proponents do not plan to ever use liberal parsing. It is, curiously, always introduced in a way that extra warnings to be generated.
  6. The change is implemented, causing wide spread incompatibilities among less vocal users. When they ask "why do it this way?" they are shouted down.

The proposed solutions are usually:

  1. Provide a global 'strict mode', e.g., Arrow.strict_mode()
  2. Provide an optional strict parameter, e.g., Arrow.get(..., strict=True)

Discussion of the solutions devolves into camps wanting default strict versus default liberals. As maintainers are in the default strict camp, neither solution is implemented, the library is too hard to use, and starts a declining adoption rate until maintainers are no longer supported by their companies to maintain the package.

Is there any way to derail this path?

@systemcatch
Copy link
Collaborator

The current version of the manual on readthedocs.io says:

Some ISO-8601 compliant strings are recognized and parsed without a format string:

arrow.get('2013-09-30T15:34:00.000-07:00')
<Arrow [2013-09-30T15:34:00-07:00]>

However, that code now generates a warning. That forces an ugly choice. Do I put in the warning suppression code, which just adds clutter to my code and adds risk that some new future warning will be unintentionally suppressed, or do I specify the format string for iso8601, which clutters up my code, seems error prone, and also seems like domain specific knowledge that should be built in to Arrow.

Hello @KenKundert that string does not need a format to be passed now or in the future, the warning will occur with or without a format string. This is so users are aware of the changes coming in 0.15.0 which affect both cases. ArrowParseWarning is specific and won't be used after 0.15.0.

@systemcatch
Copy link
Collaborator

@merriam

4. There are users that want to use the library in production and find a number of times where liberal parsing is too liberal, not raising errors.  These users are now maintainers or supporters.

Indeed, take a look at the examples below. They're clearly wrong and introduce errors that are hard to find and debug. I assume you agree these should be fixed?

arrow.get('Jun-2019', ['MMM-YY', 'MMM-YYYY'])
<Arrow [2020-06-01T00:00:00+00:00]>
arrow.get('blabla102015').isoformat()
'1020-01-01T00:00:00+00:00'
arrow.get('13/4/2045')
<Arrow [2045-01-01T00:00:00+00:00]>
5. A change is proposed to make strict parsing the default.  This is introduced as a breaking change, as the proponents do not plan to ever use liberal parsing.  It is, curiously, always introduced in a way that extra warnings to be generated.

This change does not make strict parsing the default, it simply corrects obvious errors. Warnings are clearly needed to inform users about these changes, unfortunately due to the parser design it is difficult to provide specific warnings, hence the need for a more general one.

jadchaar added a commit that referenced this issue Sep 8, 2019
@jadchaar jadchaar added the note label Sep 8, 2019
@jadchaar jadchaar changed the title Upcoming changes in version 0.15.0❗ v0.15.0 Changes❗ Sep 8, 2019
@jadchaar
Copy link
Member Author

jadchaar commented Sep 8, 2019

Hey all, v0.15.0 has been released! We have made a huge effort to keep things as compatible as possible with previous versions, but this release may potentially break existing code. Here is the final change log:

0.15.0

  • [NEW] Added support for DDD and DDDD ordinal date tokens. The following functionality is now possible: arrow.get("1998-045"), arrow.get("1998-45", "YYYY-DDD"), arrow.get("1998-045", "YYYY-DDDD").
  • [NEW] ISO 8601 basic format for dates and times is now supported (e.g. YYYYMMDDTHHmmssZ).
  • [NEW] Added humanize week granularity translations for French, Russian and Swiss German locales.
  • [CHANGE] Timestamps of type str are no longer supported without a format string in the arrow.get() method. This change was made to support the ISO 8601 basic format and to address bugs such as #447.
# will NOT work in v0.15.0
arrow.get("1565358758")
arrow.get("1565358758.123413")

# will work in v0.15.0
arrow.get("1565358758", "X")
arrow.get("1565358758.123413", "X")
arrow.get(1565358758)
arrow.get(1565358758.123413)
  • [CHANGE] When a meridian token (a|A) is passed and no meridians are available for the specified locale (e.g. unsupported or untranslated) a ParserError is raised.
  • [CHANGE] The timestamp token (X) will now match float timestamps of type str: arrow.get(“1565358758.123415”, “X”).
  • [CHANGE] Strings with leading and/or trailing whitespace will no longer be parsed without a format string. Please see the docs for ways to handle this.
  • [FIX] The timestamp token (X) will now only match on strings that strictly contain integers and floats, preventing incorrect matches.
  • [FIX] Most instances of arrow.get() returning an incorrect Arrow object from a partial parsing match have been eliminated. The following issue have been addressed: #91, #196, #396, #434, #447, #456, #519, #538, #560.

This release has been months in the making and @systemcatch and I have put in a ton of work into it. Please do not hesitate to reach out with any feedback or concerns. Thanks!

@escapewindow
Copy link

Hi,

First off, thank you for maintaining arrow, and thank you for addressing some long standing bugs.

Second, fallout from several recent arrow fixes have broken us downstream. This is understandable. However, I'm wondering if you would consider switching to semver. Namely:

  • This is a module used in production, so the version should be at least 1.0.0
  • Bug fixes are dot releases (e.g. 1.0.0 -> 1.0.1). Backwards-compatible api changes are minor version bumps (1.0.0 -> 1.1.0). Breaking changes are major version bumps. (1.0.0 -> 2.0.0).

We've found that major version bumps are a great way of informing downstream users that something has changed that may merit more investigation before deploying. This would allow us to, in this example, pin to arrow<2, and pick up fixes without automatically picking up breaking changes.

Thanks again!

@jadchaar
Copy link
Member Author

jadchaar commented Sep 9, 2019

Hi @escapewindow, @systemcatch and I actually discussed this recently. We decided that we would like to release 1.0.0 once we find that arrow has most basic features that you'd expect from a date and time library implemented. We are planning on switching to a full semantic versioning scheme once we add daylight savings time improvements (planned for 0.16.0). We wanted to get arrow into a solid feature state before transitioning to semantic versioning. It is definitely on our minds as well :).

@jaapz
Copy link
Contributor

jaapz commented Sep 16, 2019

Great release, thanks!

@jadchaar
Copy link
Member Author

jadchaar commented Oct 1, 2019

Closing this issue because the v0.15.0 release seems to have gone well. Feel free to open a new issue if new problems arise. Thanks again everyone for contributing to the conversation and helping to make Arrow even better!

@jmidyet
Copy link

jmidyet commented Oct 31, 2019

@jadchaar thousands of people/projects are already using the library. By not following standard versioning practices, you're causing headaches for other humans. Please do a major version bump for breaking changes.

@andrewelkins
Copy link
Contributor

@jmidyet Agreed and we're working on it.

@gsemet
Copy link

gsemet commented Oct 31, 2019

While I agree with you, I must admit semver says something like « every version in 0.x can break or not, up to the lib ». Lot of packages are 0.x because their maintainer just does not want to deal with backward-forward compatibility. Yes human error happens, breaking change can sneak between two minor change, but semver has a purpose to minimize this. It is a pity some maintainers do not follow it, it makes updating very risky so we end up freezing lib to an old version, decreasing security.

Semver is not that hard. Use PBR to automatize version bump or any other similar lib. And a bit of self discipline.

By the way, pendulum is a very sexy lib, I am not saying it is better but it looks like it follows semver.

@jadchaar
Copy link
Member Author

jadchaar commented Oct 31, 2019

@gsemet @jmidyet we appreciate your input and apologize if the lack of semver has caused issues. I am a huge fan of semver myself, but Arrow has been in flux the past few years as maintainers have come and gone (I just join the project back in May), but we are heavily discussing a move to semver.

@systemcatch and I discussed the potential move recently, but we wanted to keep a 1.0.0 release until after we have complete DST support implemented. That is one of the big lacking features of Arrow that we want to tackle before officially saying this is a non-beta product. 0.15.0 was focused on refactoring and fixing the parsing behavior, and now we are moving focus to DST. If anyone would like to help us implement DST and get the ball rolling on a 1.0.0 release, we'd be happy to help and look over PRs!

Also, 1.0.0 should probably be Python 3+.

@andrewelkins
Copy link
Contributor

@jadchaar probably worth keeping support for 2.7 for 1.0.0

@gsemet
Copy link

gsemet commented Nov 2, 2019

Most of major lib has dropped support for 2.7, especially pip. It's time to move one and forget about it. Lot of libs also only works on 3.5+, which makes totally sense.

@hoIIer
Copy link

hoIIer commented Mar 8, 2020

is there a way to be strict about the string being parsed? for instance:

now = arrow.utcnow().isoformat()
arrow.get(f'helloworld {now}', 'YYYY-MM-DDTHH:mm:ss.SZ')

returns a valid arrow date, whereas I want it to fail since the string contains more than just the datetime string.

edit:
seems this works:

arrow.get(f'helloworld {now}', r'[^]YYYY-MM-DDTHH:mm:ss.SZ[$]')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests