Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatter supports escape characters #688

Merged
merged 6 commits into from
Nov 2, 2019

Conversation

remeika
Copy link
Contributor

@remeika remeika commented Oct 6, 2019

The DateTimeFormatter now supports brackets as escape characters. Based almost entirely off the work of @bsidhom.

>>> formatter = arrow.formatter.DateTimeFormatter()
>>> dt = datetime(2018, 3, 9, 8, 40)

>>> fmt = "YYYY-MM-DD h [h] m"
>>> formatter.format(dt, fmt)
u'2018-03-09 8 h 40'

>>> fmt = "YYYY-MM-DD h [hello] m"
>>> formatter.format(dt, fmt)
u'2018-03-09 8 hello 40'

>>> fmt = "YYYY-MM-DD h [hello world] m"
>>> formatter.format(dt, fmt)
u'2018-03-09 8 hello world 40'

One note: I could not get coverage to pass on of my lines of code, and had to add a no cover pragma. If anyone has an idea why code coverage is unhappy with formatter.py:31, I'd love to fix it.

Fixes #35

@codecov-io
Copy link

codecov-io commented Oct 6, 2019

Codecov Report

Merging #688 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #688   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           9      9           
  Lines        1563   1565    +2     
  Branches      255    256    +1     
=====================================
+ Hits         1563   1565    +2
Impacted Files Coverage Δ
arrow/formatter.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update efd129d...1c2ac3a. Read the comment docs.

@systemcatch
Copy link
Collaborator

@remeika thanks for the PR, looks good from a quick glance over. Myself and Jad are pretty busy this week but hopefully can find some time to properly review it.

I wouldn't worry too much about the coverage problem since we're moving to pytest in #594 anyway.

Copy link
Collaborator

@systemcatch systemcatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @remeika we just have a few questions for you. We've been playing around with the new patterns at https://regex101.com/r/BTD1Gk/1/ and aren't sure why the last character in every escaped bracket is matching separately.

@@ -12,8 +12,9 @@
class DateTimeFormatter(object):

_FORMAT_RE = re.compile(
r"(YYY?Y?|MM?M?M?|Do|DD?D?D?|d?dd?d?|HH?|hh?|mm?|ss?|SS?S?S?S?S?|ZZ?Z?|a|A|X)"
r"(\[(?:(?=(?P<literal>\\\\\[|\\\\\]|[^]]))(?P=literal))*\]|YYY?Y?|MM?M?M?|Do|DD?D?D?|d?dd?d?|HH?|hh?|mm?|ss?|SS?S?S?S?S?|ZZ?Z?|a|A|X)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you walk us through this new pattern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, what is the purpose of the two pairs of escaped slashes?

)
_ESCAPED_BRACKET_RE = re.compile(r"\\\\(\[|\])")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this one.

@@ -25,6 +26,11 @@ def format(cls, dt, fmt):

def _format_token(self, dt, token):

if token and token.startswith("[") and token.endswith("]"):
return self._ESCAPED_BRACKET_RE.sub(
lambda m: m.group(1), token[1:-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain exactly what the lambda function is doing here? Maybe adding a few comments to the source code would help.

@jadchaar
Copy link
Member

Hey @remeika, any update on this? We'd love to merge this in, but we have a few clarifying questions.

@remeika
Copy link
Contributor Author

remeika commented Oct 23, 2019

@jadchaar Hey let me review the comments and reply later tonight!

@remeika remeika force-pushed the feature/formatter-escape-chars branch from 7b55e72 to 5c84035 Compare October 26, 2019 22:08
@remeika
Copy link
Contributor Author

remeika commented Oct 26, 2019

Sorry for the late response here @jadchaar @systemcatch ! I've removed all handling of escaped brackets, which I think was motivated by this comment by @bsidhom; as I said, this basic implementation was his.

But I don't think handling escaped brackets is really necessary, so this feature now does not handle them at all, which really simplifies things. I think this code is now ready to go!

@remeika
Copy link
Contributor Author

remeika commented Oct 26, 2019

@systemcatch As to why the last character of every escaped token is matching: I think this is an artifact of the workaround I'm using to emulate atomic capture groups. See the comment I added for details.

Copy link
Collaborator

@systemcatch systemcatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super solid work @remeika, thanks for the PR!

@systemcatch systemcatch merged commit 3b64e90 into arrow-py:master Nov 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow escaping of characters in format()
4 participants