Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser element custom name isn't appropriate for __repr__ #416

Open
djpohly opened this issue Jun 18, 2022 · 6 comments
Open

Parser element custom name isn't appropriate for __repr__ #416

djpohly opened this issue Jun 18, 2022 · 6 comments

Comments

@djpohly
Copy link
Contributor

djpohly commented Jun 18, 2022

The Python documentation on __repr__ encourages developers to return something that "look[s] like a valid Python expression that could be used to recreate an object with the same value" or that is at least "information-rich and unambiguous".

The custom name from set_name isn't really appropriate for this. Always using the shorthand "default name" for __repr__ would be more information-rich, although something that works like this would really be ideal:

>>> p = ZeroOrMore(one_of(["hello", "world"]) + "foo", stop_on=Keyword('end'))
>>> print(p)  # calls str()
[{hello | world 'foo'}]...
>>> p  # calls repr()
(Regex('hello|world') + 'foo')[...: Keyword('end')]
>>> q = eval(repr(p))  # nice!
>>> q.set_name("custom")  # name doesn't affect repr()
(Regex('hello|world') + 'foo')[...: Keyword('end')]
>>> print(q)  # name does affect str()
custom

All of the above I have working so far for a handful of ParserElement types. Interested in a complete PR?

(And since I seem to tie everything into documentation: Sphinx uses repr() to show the values of documented variables.)

@ptmcg
Copy link
Member

ptmcg commented Jun 30, 2022

This looks pretty cool! I think I looked at doing something like this a few years ago but, it got back-burnered in favor of other things.

Would the repr also include some of the other qualifiers like "set_name" or "set_parse_action"? Or just reconstruct the bare expression?

Yes, a PR to implement this would be very welcome.

@djpohly
Copy link
Contributor Author

djpohly commented Jul 1, 2022

I wasn't planning to show too much more than the required constructor arguments, lest it become unreadable too quickly. (The eval line above was meant only as a cute illustration, not as a proposed alternative to a class-aware copy() method.)

I'll polish something up for a PR.

@djpohly
Copy link
Contributor Author

djpohly commented Jul 1, 2022

PR is up, with a couple of questions:

  1. _PendingSkip makes perfect sense in an And context, but I don't understand what it means to create one with the match-first | operator. (I wasn't able to find it in the documentation or figure it out from the unit tests.) What is X | Y | ... supposed to mean?
  2. The Literal instances in an expression like Opt("A") + "B" are shown explicitly: Opt(Literal('A')) + Literal('B'). However, there are a number of contexts where the class name could be omitted. Cons: it would require introducing a second repr-like method to distinguish when you can and can't abbreviate, and ideally it should consider the current _literalStringClass. Pros: more concise output, and most classes would only need to implement either the regular __repr__() or the alternative.

@ptmcg
Copy link
Member

ptmcg commented Jul 5, 2022

With this change, what is the repr for Word(printables)?

Perhaps in the case of Word, we might use srange and _collapse_string_to_ranges when the initChars or bodyChars are very long strings.

@djpohly
Copy link
Contributor Author

djpohly commented Jul 5, 2022

The repr for printables is

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~

so the repr for Word(printables) is nothing unexpected:

Word('!"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~')

Using srange sounds reasonable anywhere we are dealing with a set of characters. (That said, shouldn't it return a set[str]?)

@djpohly
Copy link
Contributor Author

djpohly commented Jul 8, 2022

With the latest commit in the PR, the repr() for elements which are based on sets of characters now determines whether the character string or an srange call will take less space:

>>> Word(printables)
Word(srange('[!-~]'))
>>> Word("abcdefghijkl")
Word('abcdefghijkl')
>>> Word("abcdefghijklm")
Word('abcdefghijklm')
>>> Word("abcdefghijklmn")
Word(srange('[a-n]'))

I forgot to check out _collapse_string_to_ranges, but I'll see if that and the abbrev_charset functions I wrote can be harmonized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants