Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.leave_whitespace() may make unnecessary copies #525

Open
InSyncWithFoo opened this issue Nov 14, 2023 · 3 comments · May be fixed by #526
Open

.leave_whitespace() may make unnecessary copies #525

InSyncWithFoo opened this issue Nov 14, 2023 · 3 comments · May be fixed by #526

Comments

@InSyncWithFoo
Copy link
Contributor

InSyncWithFoo commented Nov 14, 2023

A module of mine has tens of (nested) element definitions, all have .leave_whitespace() called; it takes several seconds to run. It took me quite some time to realize that pyparsing were making copies of nested elements, which is unnecessary for my use case:

pyparsing/pyparsing/core.py

Lines 3738 to 3749 in fad68f4

def leave_whitespace(self, recursive: bool = True) -> ParserElement:
"""
Extends ``leave_whitespace`` defined in base class, and also invokes ``leave_whitespace`` on
all contained expressions.
"""
super().leave_whitespace(recursive)
if recursive:
self.exprs = [e.copy() for e in self.exprs]
for e in self.exprs:
e.leave_whitespace(recursive)
return self

pyparsing/pyparsing/core.py

Lines 4524 to 4531 in fad68f4

def leave_whitespace(self, recursive: bool = True) -> ParserElement:
super().leave_whitespace(recursive)
if recursive:
if self.expr is not None:
self.expr = self.expr.copy()
self.expr.leave_whitespace(recursive)
return self

In the same spirit with recursive (#219), I think another parameter should also be introduced to configure the behavior.

@InSyncWithFoo
Copy link
Contributor Author

InSyncWithFoo commented Nov 14, 2023

Currently I have this as a workaround:

# @cache won't work
_seen = set()

def _leave_whitespace(element: ParserElement) -> ParserElement:
    if expression in _seen:
        return element

    _seen.add(element)

    if hasattr(element, 'expr'):
        _leave_whitespace(element.expr)
    elif hasattr(element, 'exprs'):
        for epxr in element.exprs:
            _leave_whitespace(expr)
    else:
        element.leave_whitespace()

    return element

@ptmcg
Copy link
Member

ptmcg commented Nov 20, 2023

Here is an alternative implementation of _leave_whitespace for you. It makes use of expr.recurse() which returns a list of the embedded expr or exprs (so you don't need the hasattr if/elif/else stuff). Also, it uses a deque to implement the recursion without having to actually write a recursive function. In my non-pyparsing Python work, I've shifted over to this style of recursion instead of actual recursive function calling.

from collections import deque

def _leave_whitespace(element: ParserElement) -> ParserElement:

    seen = set()

    # initialize queue of expressions to visit
    to_visit = deque([element])
    
    while to_visit:
        expr = to_visit.popleft()
        if expr in seen:
            continue
        seen.add(expr)

        # make the call on the current expr
        expr.leave_whitespace()

        # add child exprs to todo list
        to_visit.extend(expr.recurse())

    return element

@InSyncWithFoo
Copy link
Contributor Author

InSyncWithFoo commented Nov 20, 2023

@ptmcg Thanks. It would be nice if you document methods like this one so that we users know which ones are implementation details and which are not. Also, the method call should have recursive = False.

@InSyncWithFoo InSyncWithFoo changed the title .leave_whitespace() may makes unnecessary copies .leave_whitespace() may make unnecessary copies Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants