Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in AST recursion depth tracking change of gh-95185 #106905

Closed
mmohrhard opened this issue Jul 20, 2023 · 15 comments · Fixed by #113035
Closed

Error in AST recursion depth tracking change of gh-95185 #106905

mmohrhard opened this issue Jul 20, 2023 · 15 comments · Fixed by #113035
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@mmohrhard
Copy link
Contributor

mmohrhard commented Jul 20, 2023

Bug report

The change in 0047447 seems to miss the recusrion depth adjustment in case of an error. As an example for some of the generated code:

PyObject*
ast2obj_mod(struct ast_state *state, void* _o)
{
    mod_ty o = (mod_ty)_o;
    PyObject *result = NULL, *value = NULL;
    PyTypeObject *tp;
    if (!o) {
        Py_RETURN_NONE;
    }
    if (++state->recursion_depth > state->recursion_limit) {
        PyErr_SetString(PyExc_RecursionError,
            "maximum recursion depth exceeded during ast construction");
        return 0;
    }
    switch (o->kind) {
    case Module_kind:
        tp = (PyTypeObject *)state->Module_type;
        result = PyType_GenericNew(tp, NULL, NULL);
        if (!result) goto failed;
        value = ast2obj_list(state, (asdl_seq*)o->v.Module.body, ast2obj_stmt);
        if (!value) goto failed;
        if (PyObject_SetAttr(result, state->body, value) == -1)
            goto failed;
...
    }
    state->recursion_depth--;
    return result;
failed:
    Py_XDECREF(value);        
    Py_XDECREF(result);        
    return NULL;
}

Note that the failed code path is missing the state->recursion_depth--; statement.

I found this as I'm trying to track down where spurious SystemError: AST constructor recursion depth mismatch errors in Python 3.11 are coming from. E.g.

  File "/env/lib/python3.11/site-packages/bloscpack/numpy_io.py", line 358, in unpack_ndarray_from_bytes
    return unpack_ndarray(source)
      ^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/bloscpack/numpy_io.py", line 305, in unpack_ndarray
    sink = PlainNumpySink(source.metadata)
      ^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/bloscpack/numpy_io.py", line 136, in __init__
    dtype_ = ast.literal_eval(metadata['dtype'])
    ^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/ast.py", line 64, in literal_eval
    node_or_string = parse(node_or_string.lstrip(" \t"), mode='eval')
    ^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
      ^^^^^^^^^^^^^^^^^
SystemError: AST constructor recursion depth mismatch (before=48, after=47)

Your environment

Reproduced in Python 3.11 but the code in main looks the same.

Linked PRs

@romuald
Copy link
Contributor

romuald commented Jul 24, 2023

I can confirm this issue.

It causes a bunch of INTERNALERROR in some specific case with py.test when it tries to report errors traces

Reverting 0047447 locally solves the problem (actual error is properly reported)
(edit: typo)

@MaZderMind
Copy link

Might this be the root-cause for sqlalchemy/mako#378, pytest-dev/pytest#10874, home-assistant/core#95364, bieniu/ha-shellies-discovery#448 and other projects reporting SystemError: AST constructor recursion depth mismatch since python 3.11? If so, while the issue might not happen too often, it still hits lots and projects and people.

@malthe
Copy link

malthe commented Nov 13, 2023

@serhiy-storchaka do you perhaps have some information to help this issue along?

pablogsal pushed a commit that referenced this issue Nov 13, 2023
…#106906)

* gh-106905: avoid incorrect SystemError about recursion depth mismatch

* Update Misc/NEWS.d/next/Core and Builtins/2023-07-20-11-41-16.gh-issue-106905.AyZpuB.rst

---------

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 13, 2023
…smatch (pythonGH-106906)

* pythongh-106905: avoid incorrect SystemError about recursion depth mismatch

* Update Misc/NEWS.d/next/Core and Builtins/2023-07-20-11-41-16.gh-issue-106905.AyZpuB.rst

---------

(cherry picked from commit 1447af7)

Co-authored-by: Markus Mohrhard <markusm@dug.com>
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 26, 2023
@yilei
Copy link
Contributor

yilei commented Dec 7, 2023

Thanks @mmohrhard for the fix and @pablogsal for reviewing and merging the fix.

Any thoughts on what conditions this bug will likely happen? We are also seeing the "recursion depth mismatch" error but only in production. So we are trying to construct a reproducible example so we could be more confident that bringing #106906 would fix our issues in production.

@yilei

This comment was marked as outdated.

gpshead pushed a commit to gpshead/cpython that referenced this issue Dec 7, 2023
…epth mismatch (python#106906)

Backport of 1447af7 from python#106906.

* pythongh-106905: avoid incorrect SystemError about recursion depth mismatch
* Update Misc/NEWS.d/next/Core and Builtins/2023-07-20-11-41-16.gh-issue-106905.AyZpuB.rst

---------

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@gpshead
Copy link
Member

gpshead commented Dec 7, 2023

Agreed that a regression test for this would be good to have.

@gpshead gpshead changed the title Error in recursion depth tracking change of gh-95185 Error in Python-ast recursion depth tracking change of gh-95185 Dec 7, 2023
@gpshead gpshead changed the title Error in Python-ast recursion depth tracking change of gh-95185 Error in AST recursion depth tracking change of gh-95185 Dec 7, 2023
@yilei
Copy link
Contributor

yilei commented Dec 12, 2023

#106906 didn't fix all errors in our case. We found a very large test that could reproduce about 1~2% of the time.

Upon investigation, I discovered that during the lifetime of the PyAST_mod2obj function, the GIL could be released, e.g. due to a GC callback:

    @     0x564bce01709e         64  drop_gil
    @     0x564bce018250        368  _PyEval_EvalFrameDefault
    @     0x564bce017d10         48  _PyEval_Vector
    @     0x564bcdf129d4         96  _PyObject_CallFunctionVa
    @     0x564bcdf12855        224  PyObject_CallFunction
    @     0x564bcdd89901         64  invoke_gc_callback
    @     0x564bcdd88586         64  _PyObject_GC_Link
    @     0x564bcdd88606         48  _PyObject_GC_New
    @     0x564bcdf3f88f         48  PyList_New
    @     0x564bcdff6082         64  ast2obj_stmt
    @     0x564bcdff53ee         64  ast2obj_list
    @     0x564bcdff597a         64  ast2obj_stmt
    @     0x564bcdfefc94        176  PyAST_mod2obj

This means it's not safe to use the shared ast_state struct to track the recursion depth.

@yilei
Copy link
Contributor

yilei commented Dec 12, 2023

I constructed a reliable reproducible example:

import ast
import difflib
import gc
import pathlib
import random
import threading
import unittest


def gc_callback(phase, info):
  del phase, info
  seq_a = [str(random.randint(0, 10)) + "\n" for _ in range(100)]
  seq_b = [str(random.randint(0, 10)) + "\n" for _ in range(100)]
  udiff = list(difflib.unified_diff(seq_a, seq_b))
  udiff.append("THE END")


gc.callbacks.append(gc_callback)


NUM_RUNS = 100


class AstTest(unittest.TestCase):

  def setUp(self):
    super().setUp()
    self.error_happened = None

  def ast_parse(self, content):
    try:
      for _ in range(100):
        ast.parse(content, "source")
    except SystemError as e:
      self.error_happened = e

  def test_ast(self):
    content = pathlib.Path(__file__).read_text()

    for i in range(NUM_RUNS):
      print(f"Run: {i}/{NUM_RUNS}", flush=True)
      ts = []
      for _ in range(100):
        t = threading.Thread(target=self.ast_parse, args=(content,))
        t.start()
        ts.append(t)

      for t in ts:
        t.join()

      self.assertIsNone(self.error_happened)


if __name__ == "__main__":
  unittest.main()

I have a patch to make PyAST_mod2obj use a separate struct allocated on the stack to track the recursion (defined the same as the struct validator in ast.c, and the test passed.

I can send a PR, but I'm not sure what's the best way to construct the test / where to add the test. Any pointers?

@yilei
Copy link
Contributor

yilei commented Dec 12, 2023

Hmm, this is only reproducible on Python 3.11, but not 3.12. The bug reports referenced here are all for Python 3.11 too.

Looks like #97922 is relevant? Any other chances the GIL might still get released during PyAST_mod2obj for Python 3.12+?

@gpshead
Copy link
Member

gpshead commented Dec 12, 2023

Agreed, I see this on 3.11 builds not 3.12.1+.

This is one of those situations where a reliable regression test is hard to construct given failure requires triggering a threading related race condition who's exact conditions are difficult to setup.

If we want a regression test checked in at all (we may not)... I'd add such a test class within Lib/test/test_ast.py but include some conditional logic around how it runs by configuring how long it'll run for within setUp as such:

self.num_runs = 20 if support.resource_enabled("cpu") else 1

Reproducing race conditions is hard and highly environment dependent. I can reproduce it reliably within the first iteration in a couple of seconds on a few different non-cloud-VM low spec machines, but it only reproduces some of the time within 100 iterations my modern high spec cloud VM (after spending 3 minutes trying).

If it ever pops up as failing it'll do so in a flaky manner but would still serve as a sign that something needs investigation.

@gpshead
Copy link
Member

gpshead commented Dec 12, 2023

That GC behavior change in 3.12+ does sound related to it not happening there. I suspect this is still worth fixing in main to not rely on the GIL for this depth tracking state regardless as it'd possibly show up again in free-threaded pep-703 builds?

@yilei
Copy link
Contributor

yilei commented Dec 13, 2023

I created a draft PR: #113035 with the implementation.

Will wait for guidance on whether we need this in main for free-threaded builds.

@pablogsal
Copy link
Member

Agreed, I see this on 3.11 builds not 3.12.1+.

In 3.12 I changed the GC to run only on the eval breaker so it doesn't run now on object allocation. This may be the reason this doesn't reproduce (If I understand correctly) in 3.12.

@gpshead
Copy link
Member

gpshead commented Dec 13, 2023

I've confirmed that the draft PR #113035 backported to 3.11 fixes the problem on a machine that immediately reproduces the problem with the above test code otherwise. 3.12 "got lucky" with that GC change and likely avoided this bug being possible (or at least far more rare if there are any other paths that could similarly enable it).

yilei is also testing the PR on our 3.11 runtime at work to ensure it fixes our users problems.

@yilei
Copy link
Contributor

yilei commented Dec 15, 2023

After patching our 3.11 runtime two days ago at work, I can confirm #113035 fixed the issue for us.

serhiy-storchaka pushed a commit that referenced this issue Dec 25, 2023
…T_mod2obj call. (GH-113035)


Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
serhiy-storchaka pushed a commit to serhiy-storchaka/cpython that referenced this issue Dec 25, 2023
… in each PyAST_mod2obj call. (pythonGH-113035)

(cherry picked from commit 48c4973)

Co-authored-by: Yilei Yang <yileiyang@google.com>
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
serhiy-storchaka added a commit that referenced this issue Dec 25, 2023
…ch PyAST_mod2obj call. (GH-113035) (GH-113472)

(cherry picked from commit 48c4973)

Co-authored-by: Yilei Yang <yileiyang@google.com>
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Dec 25, 2023
…n depth in each PyAST_mod2obj call. (pythonGH-113035) (pythonGH-113472)

(cherry picked from commit 48c4973)

(cherry picked from commit d58a5f4)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Yilei Yang <yileiyang@google.com>
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
serhiy-storchaka added a commit that referenced this issue Dec 25, 2023
…ch PyAST_mod2obj call. (GH-113035) (GH-113472) (GH-113476)

(cherry picked from commit 48c4973)
(cherry picked from commit d58a5f4)

Co-authored-by: Yilei Yang <yileiyang@google.com>
Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
ryan-duve pushed a commit to ryan-duve/cpython that referenced this issue Dec 26, 2023
…h PyAST_mod2obj call. (pythonGH-113035)


Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
kulikjak pushed a commit to kulikjak/cpython that referenced this issue Jan 22, 2024
…h PyAST_mod2obj call. (pythonGH-113035)


Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…smatch (python#106906)

* pythongh-106905: avoid incorrect SystemError about recursion depth mismatch

* Update Misc/NEWS.d/next/Core and Builtins/2023-07-20-11-41-16.gh-issue-106905.AyZpuB.rst

---------

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…h PyAST_mod2obj call. (pythonGH-113035)


Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants