Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse actions skipped inside delimited_list #345

Closed
kimgr opened this issue Dec 13, 2021 · 7 comments
Closed

Parse actions skipped inside delimited_list #345

kimgr opened this issue Dec 13, 2021 · 7 comments

Comments

@kimgr
Copy link

kimgr commented Dec 13, 2021

Hello,

I'm not sure if this is a bona-fide bug, or if I'm misusing pyparsing :-)

I have a huge pyparsing grammar for ASN.1 syntax over in https://github.com/kimgr/asn1ate/blob/master/asn1ate/parser.py. The repro below is not minimal. but uses some techniques from asn1ate's parser to demonstrate what I'm trying to do:

#!/usr/bin/env python
from pyparsing import *


class AnnotatedToken(object):
    def __init__(self, kind, elements):
        self.kind = kind
        self.elements = elements

    def __str__(self):
        return 'T(%s, %r)' % (self.kind, self.elements)

    __repr__ = __str__
    

def grammar():
    def annotate(name):
        def _(t):
            return AnnotatedToken(name, t.asList())
        return _

    identifier = Word(srange('[a-z0-9]'))
    numeral = Word(nums)

    named_number_value = Suppress('(') + numeral + Suppress(')')
    named_number = identifier + named_number_value

    named_number_list = (Suppress('{') +
                         Group(Optional(delimitedList(named_number))) +
                         Suppress('}'))

    identifier.setParseAction(annotate("id"))
    named_number.setParseAction(annotate("nn"))

    # BUG(?): This parse action is never called after commit
    # 9987004c94ccf7d9b6b3adbcf06d05d2ff197737
    named_number_value.setParseAction(annotate("val"))

    g = OneOrMore(named_number_list)
    return g


g = grammar()
res = g.parseString("""
{ x1(1), x2(2) }
""")
print(res.dump())

The concrete problem I'm seeing downstream is that a parse action intended to decorate the parse result with a type name is never called, and so later stages can't use the annotation to identify what kind of element it is.

I think the issue is that delimited_list now mutates the expression by calling its streamline method: 9987004#diff-daba53cec7bed1be7b180ee5e8378c772408d07afe6f2cad6d62e966993b9e45L38.

I haven't fully gotten my head around what streamline is supposed to do, but it seems to reduce the result to a literal value, skipping over any interim rules and parse actions.

Is that a bug? Or is there a way to phrase the grammar in a way that it works with both old and new pyparsing?

Thanks!

@kimgr kimgr changed the title Parse actions skipped Parse actions skipped inside delimited_list Dec 13, 2021
@ptmcg
Copy link
Member

ptmcg commented Dec 13, 2021

streamline() is intended to collapse the nested pyparsing objects like And(And(And(a, b), c), d))) that get created by a + b + c + d to And(a, b, c, d). streamline() is supposed to not collapse out expressions that have custom names or parse actions, but that may have happened in this last update. If that gives you a lead, let me know.

@ptmcg
Copy link
Member

ptmcg commented Dec 13, 2021

Try using the pyparsing debugging decorator traceParseAction to see what if your parse action is getting called or not:

def annotate(name):
    @traceParseAction   # <-----
    def _(t):
        return AnnotatedToken(name, t.asList())

    return _

@kimgr
Copy link
Author

kimgr commented Dec 13, 2021

Nice, thanks!

I'm sure the named_number_value parse action isn't called, but traceParseAction shows it even more clearly. It didn't occur to me to include the output, here goes:

# without traceParseAction
~/code/pyparsing$ ./repro.py 
[[T(nn, [T(id, ['x1']), '1']), T(nn, [T(id, ['x2']), '2'])]]
[0]:
  [T(nn, [T(id, ['x1']), '1']), T(nn, [T(id, ['x2']), '2'])]

# with traceParseAction
~/code/pyparsing$ ./repro.py 
>>entering _(line: '{ x1(1), x2(2) }', 3, ParseResults(['x1'], {}))
<<leaving _ (ret: T(id, ['x1']))
>>entering _(line: '{ x1(1), x2(2) }', 3, ParseResults([T(id, ['x1']), '1'], {}))
<<leaving _ (ret: T(nn, [T(id, ['x1']), '1']))
>>entering _(line: '{ x1(1), x2(2) }', 10, ParseResults(['x2'], {}))
<<leaving _ (ret: T(id, ['x2']))
>>entering _(line: '{ x1(1), x2(2) }', 10, ParseResults([T(id, ['x2']), '2'], {}))
<<leaving _ (ret: T(nn, [T(id, ['x2']), '2']))
[[T(nn, [T(id, ['x1']), '1']), T(nn, [T(id, ['x2']), '2'])]]
[0]:
  [T(nn, [T(id, ['x1']), '1']), T(nn, [T(id, ['x2']), '2'])]

@ptmcg
Copy link
Member

ptmcg commented Dec 13, 2021

Great! Lastly, could you post the output from before the bug occurred? So I can see the expected results and how they differ.

@kimgr
Copy link
Author

kimgr commented Dec 14, 2021

Sure, this is from the pyparsing_3.0.4 tag:

~/code/pyparsing$ ./repro.py 
>>entering _(line: '{ x1(1), x2(2) }', 3, ParseResults(['x1'], {}))
<<leaving _ (ret: T(id, ['x1']))
>>entering _(line: '{ x1(1), x2(2) }', 5, ParseResults(['1'], {}))
<<leaving _ (ret: T(val, ['1']))
>>entering _(line: '{ x1(1), x2(2) }', 3, ParseResults([T(id, ['x1']), T(val, ['1'])], {}))
<<leaving _ (ret: T(nn, [T(id, ['x1']), T(val, ['1'])]))
>>entering _(line: '{ x1(1), x2(2) }', 10, ParseResults(['x2'], {}))
<<leaving _ (ret: T(id, ['x2']))
>>entering _(line: '{ x1(1), x2(2) }', 12, ParseResults(['2'], {}))
<<leaving _ (ret: T(val, ['2']))
>>entering _(line: '{ x1(1), x2(2) }', 10, ParseResults([T(id, ['x2']), T(val, ['2'])], {}))
<<leaving _ (ret: T(nn, [T(id, ['x2']), T(val, ['2'])]))
[[T(nn, [T(id, ['x1']), T(val, ['1'])]), T(nn, [T(id, ['x2']), T(val, ['2'])])]]
[0]:
  [T(nn, [T(id, ['x1']), T(val, ['1'])]), T(nn, [T(id, ['x2']), T(val, ['2'])])]

Note the trace actions for

>>entering _(line: '{ x1(1), x2(2) }', 12, ParseResults(['2'], {}))
<<leaving _ (ret: T(val, ['2']))

and how the literal values are wrapped in T(val, ['2']) in the resulting tree.

@ptmcg
Copy link
Member

ptmcg commented Dec 15, 2021

Yes, the culprit is the embedded call to streamline(), which can modify the contents of an And expression. I will fix this in 3.0.7.

In the meantime, you can workaround this bug by moving the named_number_value.setParseAction(annotate("val")) statement to immediately follow the line where named_number_value is defined (which is prior to the call to delimitedList). This workaround will work in the released code as well.

@ptmcg ptmcg closed this as completed in 2f633f4 Dec 15, 2021
kimgr added a commit to kimgr/asn1ate that referenced this issue Dec 15, 2021
We ran into the following pyparsing bug:
pyparsing/pyparsing#345

After some bisecting it turned out it's only present in pyparsing 3.0.5-6,
and the pyparsing folks were quick to fix it for 3.0.7.

Exclude the broken versions from install_requires.
@kimgr
Copy link
Author

kimgr commented Dec 15, 2021

Thanks for the quick resolution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants