Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Pydocstyle crashes on literal strings in pyproject.toml #600

Closed
codejedi365 opened this issue Jul 1, 2022 · 0 comments · Fixed by #608
Closed

Pydocstyle crashes on literal strings in pyproject.toml #600

codejedi365 opened this issue Jul 1, 2022 · 0 comments · Fixed by #608

Comments

@codejedi365
Copy link

codejedi365 commented Jul 1, 2022

Problem

When an unrelated configuration (ex semantic_release) has a literal string, such as a regular expression, in the configuration denoted by """, pydocstyle will throw a toml.decoder.TomlDecodeError for an unterminated string. This likely does not happen with every literal string but causes errors when there is a single quote inside the regexp.

My offending config:

# pyproject.toml

[tool.semantic_release]
version_pattern = [
    # regular expression to find version value in `_version.py` file
    '''src/pkg1/_version.py:__version__[ ]*[:=][ ]*["'](\d+\.\d+\.\d+)["']'''
]

[tool.pydocstyle]
convention = 'pep257'

Log

(venv) $ pydocstyle scripts/prepare.py

Traceback (most recent call last):
  File "/workspaces/py-rpm/venv/bin/pydocstyle", line 8, in <module>
    sys.exit(main())
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/cli.py", line 75, in main
    sys.exit(run_pydocstyle())
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/cli.py", line 41, in run_pydocstyle
    for (
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 288, in get_files_to_check
    config = self._get_config(os.path.abspath(name))
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 369, in _get_config
    config = self._get_config_by_discovery(node)
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 318, in _get_config_by_discovery
    config = self._get_config(parent_dir)
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 369, in _get_config
    config = self._get_config_by_discovery(node)
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 312, in _get_config_by_discovery
    config_file = self._get_config_file_in_folder(path)
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 555, in _get_config_file_in_folder
    if config.read(full_path) and cls._get_section_name(config):
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 70, in read
    self._config.update(toml.load(fp))
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/toml/decoder.py", line 156, in load
    return loads(f.read(), _dict, decoder)
  File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/toml/decoder.py", line 362, in loads
    raise TomlDecodeError("Unterminated string found."
toml.decoder.TomlDecodeError: Unterminated string found. Reached end of file. (line 121 column 1 char 2619)

Investigation

This seems to be a limitation of the parser implementation and associated TOML standard. I looked at the dependency trees of semantic_release and found that they use the library tomlkit instead of toml because it supports v1.0.0 of the TOML standard instead of v0.5.0. Under the hood, it seems there is a few flaws with the parser in toml==0.5.0 since I can change the regular expression in different variations and get different but not obvious/expected results. One such oddity, inside a the triple single quotes ''' if you have two double quotes " somewhere within it, it will cause an Unterminated string error, but if only one exists it is fine. The other variation that shouldn't work but does, is escaping the double quotes (ie. \") and it is fine.

I also found that the toml library itself is stale and has not received any updates since Oct 2020. Whereas tomlkit and its competitor tomli have both received updates in the 1st half of 2022. Furthermore, python3.11 also highlights these two frontrunners as the ideal libraries to read/write toml in the Python docs. Maybe in a year future you can use the python3.11 built-in library tomllib but clearly that would be incompatible for a few years.

Additional discussion on TOML support for raw/literal strings: toml-lang/toml#80

Recommendation

Switch toml dependency to tomlkit or tomli.

I have tested both of the variations tomli==2.0.1 and tomlkit==0.10.2 and both parse my pyproject.toml configuration file (as provided above) with regex correctly without error. tomlkit does seem to be leading in popularity but the tomli documentation is a bit better. Also of note, tomli.load() requires the file to have been opened for reading in bytes instead of a specified encoding.

Related: #599

mgorny added a commit to mgorny/pydocstyle that referenced this issue Oct 12, 2022
Use the built-in `tomllib` module in Python 3.11 and the modern `tomli`
package in older Python versions to read .toml configs instead of
the unmaintained and broken `toml` package.

Fixes PyCQA#599
Fixes PyCQA#600
mgorny added a commit to mgorny/pydocstyle that referenced this issue Oct 12, 2022
Use the built-in `tomllib` module in Python 3.11 and the modern `tomli`
package in older Python versions to read .toml configs instead of
the unmaintained and broken `toml` package.

Fixes PyCQA#599
Fixes PyCQA#600
mgorny added a commit to mgorny/pydocstyle that referenced this issue Jan 3, 2023
Use the built-in `tomllib` module in Python 3.11 and the modern `tomli`
package in older Python versions to read .toml configs instead of
the unmaintained and broken `toml` package.

Fixes PyCQA#599
Fixes PyCQA#600
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant