Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate gives infinite recursion for recursive grammars #3072

Open
sylee957 opened this issue Nov 16, 2022 · 5 comments · May be fixed by #3145
Open

generate gives infinite recursion for recursive grammars #3072

sylee957 opened this issue Nov 16, 2022 · 5 comments · May be fixed by #3145

Comments

@sylee957
Copy link

The following code for generating simple recursive grammar does not work

from nltk.grammar import Nonterminal, Production, CFG
from nltk.parse.generate import generate


S = Nonterminal('S')
P = [Production(S, ['a', S]), Production(S, [])]
G = CFG(S, P)

gen = generate(G)
print(next(gen))

gives RuntimeError: The grammar has rule(s) that yield infinite recursion!

I think that the problem should not be an issue if the algorithm uses different searching technique
and iterators should be capable of generating infinite sequences.
Without the capability to generate recursive grammars, it loses many practical capability because it can only handle finite grammars.

@ekaf
Copy link
Contributor

ekaf commented Feb 20, 2023

@sylee957, with an infinite grammar, you can use the depth parameter to limit the recursion depth:

gen = generate(G, depth=10)
print(next(gen))

['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a']

@sylee957
Copy link
Author

I think that it also has the problem if it doesn't reach 'aa', 'aaa', 'aaaa', ...

@ekaf
Copy link
Contributor

ekaf commented Feb 20, 2023

@sylee957 actually, given a finite value for 'depth' lower than the recursion limit, the generate function does reach all the allowable productions. Since it concatenates lists and not strings, it outputs forms like ['a', 'a'] instead of 'aa'. And since it applies the grammar depth-first, the first output is the deepest tree, while the breadth-first strategy would start with ['a'].

@ekaf
Copy link
Contributor

ekaf commented Feb 22, 2023

For example:

n = 5
gen = generate(G, depth=n)
for _i in range(n):
    print(next(gen))

['a', 'a', 'a', 'a']
['a', 'a', 'a']
['a', 'a']
['a']
[]

@ekaf
Copy link
Contributor

ekaf commented Feb 22, 2023

So the real problem seems to be that, at line 29, generate.py sets a default depth that is much higher than the recursion limit:

    if depth is None:
        depth = sys.maxsize

However, using sys.getrecursionlimit() instead, which is 1000, is also still too high.
With the above grammar, the highest usable depth was 331 for me. Maybe the reason for this could be that the indirect recursions between _generate_all() and _generate_one() are adding up towards the recursion limit. If that is the reason, the maximum possible depth could be a little less than a third of the recursion limit.

Then, the solution could be, on one hand, to lower the default depth and, on the other hand, to output a more informative warning when the recursion limit is reached,
and mention the possibility of raising that value, using sys.setrecursionlimit().

@ekaf ekaf linked a pull request Apr 21, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants