Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot parse code with "match / case" #79

Closed
rabernat opened this issue Nov 8, 2022 · 6 comments
Closed

Cannot parse code with "match / case" #79

rabernat opened this issue Nov 8, 2022 · 6 comments
Assignees
Labels
type: bug Something isn't working

Comments

@rabernat
Copy link

rabernat commented Nov 8, 2022

Thanks for maintaining this fantastic project! 馃檹 We are using it to integrate our python API docs with a Docusaurus site.

Describe the bug
Python 3.10 introduced structural pattern matching with match / case syntax. I have found that mydoc markdown cannot parse code with this syntax. I am filing the bug report here rather than in pydoc-markdown because the stack trace indicates that the error comes from docspec_python

To Reproduce
Steps to reproduce the behavior:

Create the following python module

def function_with_match():
    """A function that can't be parsed with pydoc-markdown."""

    foo = "a"
    match foo:
        case "a":
            pass

Create a pydoc-markdown configuration to parse it. Mine looks like this

loaders:
  - type: python
    search_path: [../pydoc-markdown-bug]
processors:
  - type: filter
    skip_empty_modules: true
  - type: smart
  - type: crossref
renderer:
  type: docusaurus
  docs_base_path: docs
  relative_output_path: reference
  relative_sidebar_path: sidebar.json
  sidebar_top_level_label: 'Reference'

Then run pydoc-markdown. My stack trace looks like this

Traceback (most recent call last):
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/parser.py", line 88, in parse_to_ast
    return RefactoringTool([], options).refactor_string(code + '\n', filename)
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/refactor.py", line 364, in refactor_string
    self.log_error("Can't parse %s: %s: %s",
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/refactor.py", line 362, in refactor_string
    tree = self.driver.parse_string(data)
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/pgen2/driver.py", line 103, in parse_string
    return self.parse_tokens(tokens, debug)
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/pgen2/driver.py", line 71, in parse_tokens
    if p.addtoken(type, value, (prefix, start)):
  File "/Users/rabernat/mambaforge/lib/python3.10/lib2to3/pgen2/parse.py", line 162, in addtoken
    raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=1, value='foo', context=(' ', (5, 10))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/bin/pydoc-markdown", line 8, in <module>
    sys.exit(cli())
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/pydoc_markdown/main.py", line 344, in cli
    session.render(pydocmd)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/pydoc_markdown/main.py", line 136, in render
    modules = config.load_modules()
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/pydoc_markdown/__init__.py", line 154, in load_modules
    modules.extend(loader.load())
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/__init__.py", line 87, in load_python_modules
    yield parse_python_module(filename, module_name=module_name, options=options, encoding=encoding)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/__init__.py", line 128, in parse_python_module
    return parse_python_module(fpobj, fp, module_name, options, encoding)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/__init__.py", line 132, in parse_python_module
    ast = parser.parse_to_ast(fp.read(), filename)
  File "/Users/rabernat/Library/Caches/pypoetry/virtualenvs/icechunk-client-6qaybCGJ-py3.10/lib/python3.10/site-packages/docspec_python/parser.py", line 90, in parse_to_ast
    raise ParseError(exc.msg, exc.type, exc.value, tuple(exc.context) + (filename,))
lib2to3.pgen2.parse.ParseError: bad input: type=1, value='foo', context=(' ', (5, 10), '/Users/rabernat/gh/earth-mover/pydoc-markdown-bug/match_bug.py')

Expected behavior
Given that docspec-python supports Python >=3.7, I would expect it to be able to parse all valid python 3.10 syntax.

Versions
pydoc-markdown, version 4.6.3
docspec-python, version 2.0.2

@rabernat rabernat added the type: bug Something isn't working label Nov 8, 2022
@NiklasRosenstein
Copy link
Owner

Hi @rabernat, thanks for raising this issue! Unfortunately I won't have time to look into this in the short term, but I'd be happy to accept a PR if you're up for it.

@NiklasRosenstein
Copy link
Owner

Side note: I'm not sure if lib2to3 was updated to support match/case.

@nrser
Copy link
Contributor

nrser commented Feb 22, 2023

Looks like lib2to3 is incapable of parsing match (due to it being LL(1)) and is deprecated / scheduled to be removed from the language:

https://docs.python.org/3.11/library/2to3.html#module-lib2to3

These packages are recommended as alternatives:

  1. LibCST
  2. parso

I'm taking a look at it now as I just ran into this issue and getting rid of match isn't really an option. No promises I'll get anywhere with it but if I do I'll share the code.

@nrser
Copy link
Contributor

nrser commented Feb 22, 2023

@rabernat @NiklasRosenstein This passes the docspec-python tests, including a new one for the match statement:

https://github.com/nrser/docspec/tree/blib2to3

It's a single commit:

nrser@1a08d2a

It seems the black folks have their own fork/extension of lib2to3 called blib2to3 that is bundled with the black package. They managed to get it to parse at least some amount of match forms.

I added black as a dependency and swapped blib2to3 in. This is totally a "quick fix", and I have no idea how well it will work, but I wanted to share it now in case I never end up getting any further with it.

@nrser
Copy link
Contributor

nrser commented Feb 22, 2023

Just a heads up, tried that code on source from an actual project and there are a bunch of issues. Looks like relatively minor stuff involving the AST being slightly different, but it's gonna take some time to grind through.

@NiklasRosenstein
Copy link
Owner

Merged #80, thanks @nrser!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants