Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReDoS vulnerability in svnurl.py #287

Open
SCH227 opened this issue Sep 22, 2022 · 34 comments
Open

ReDoS vulnerability in svnurl.py #287

SCH227 opened this issue Sep 22, 2022 · 34 comments

Comments

@SCH227
Copy link

SCH227 commented Sep 22, 2022

Good night!

I found that this regex is vulnerable to Regular Expression Denial of Service.

PoC:

>>> from py._path.svnurl import InfoSvnCommand
>>> payl = "   2256      hpk        165 Nov 24 17:55 __init__.py" + " " * 5000
>>> InfoSvnCommand(payl)

Attack vector:

An user accessing a (possibly remote) subversion repository that provides malicious "info" data.
Or an attacker injecting 'svn ls http://...' output (less realistic).

Fix:

Use a pattern with non-overlapping groups. I can help in finding a better regex and testing if needed.

@The-Compiler
Copy link
Member

Related: #256

@bluetech
Copy link
Member

I doubt there's anyone using this code, so I don't think it would warrant any sort of security notification that users should bother with, but if you prepare a PR for this I'll merge & release it.

@skialpine
Copy link

GHSA-w596-4wvx-j9j6
https://nvd.nist.gov/vuln/detail/CVE-2022-42969

@jenstroeger
Copy link

As @skialpine mentions, the advisory now triggers pip-audit and thus can fail CI runs (like this one).

Is there a fix on the horizon?

@RonnyPfannschmidt
Copy link
Member

The issue is not considered critical, im not aware of anyone working towards a fix

@The-Compiler
Copy link
Member

The-Compiler commented Oct 19, 2022

Well, congratulations to whoever it is that decided that the right path of action here is getting a CVE for...

  • Something that seems very questionable to be titled a "vulnerability" in the first place (if I can control an SVN repo/server, I might as well just Slowloris the initial info request, I suppose... though admittedly I didn't try that.)
  • For some ~18 year old code...
  • ...which is only there for historical reasons and discouraged to use
  • ...which, to the best of our knowledge, is not used anywhere in the wild outside of some old PyPy development scripts nobody probably uses anymore (and certainly not against random SVN servers). Note the search results seem to be copies of those PyPy scripts, as far as I can tell.
  • ...which, given the above, nobody is terribly interested in maintaining
  • ...yet you ended up generating nothing but noise for hundreds of thousands of pytest users, which for historical reasons depends on pylib (since it came from the PyPy project, like pylib does). Yes, not everyone of the half quarter a million projects there will monitor CVEs against dependencies, but at the same time, lots of people that do (in companies and such) are probably not in that list on GitHub.

Given that it will still take some time for pytest to get fully rid of it's py internals, and people seem to like getting CVEs for this project (which just happens to be very popular via pytest, but pretty much unused outside of pytest), but without really making an attempt to understanding the context... can we please consider:

  • Either vendoring the remaining code (py.path I assume) in pytest, and archiving pylib...
  • ...or releasing a pylib 2.0.0 which simply drops all the code that isn't used in pytest?

Sorry if I sound frustrated. But requesting a CVE without understanding the context of the project/issue you're reporting, and then generating false reports for hundreds of thousands of pytest users is doing a major disservice to both pytest/pylib maintainers and all those users.

@RonnyPfannschmidt
Copy link
Member

I'd go as far as to label this cve report a supply chain attack

Pytest should keep dropping pylib as it does

The cve should be added as common false positive

@The-Compiler
Copy link
Member

I'd go as far as to label this cve report a supply chain attack

I would not, but I'd certainly add it to my "examples of behavior causing open source maintainer burnout" list, given that we're now getting the first pytest issue about this, and it almost certainly won't be the last one.

@bluetech
Copy link
Member

bluetech commented Oct 19, 2022 via email

@RonnyPfannschmidt
Copy link
Member

@bluetech @The-Compiler i woudnt mind to cut a release that drops the svn wc stuff alltogether

@The-Compiler
Copy link
Member

@RonnyPfannschmidt agreed, and perhaps a bit more even - I opened #288 with some overview on that.

@The-Compiler
Copy link
Member

FWIW, I've also proposed adding a note to the GitHub advisory (github/advisory-database#761) and tweeted a PSA.

@The-Compiler
Copy link
Member

GitHub have now reacted and amended the advisory:

The particular codepath in question is the regular expression at py._path.svnurl.InfoSvnCommand.lspattern and is only relevant when dealing with subversion (svn) projects. Notablely the codepath is not used in the popular pytest project.

and apparently also added that information for Dependabot alerts:

I've also added the codepath to the advisory so that dependabot can more intelligently target alerts.

@woodruffw
Copy link

pip-audit maintainer here. I have no say in it, but it's a shame (in my personal opinion) that this kind of low-quality finding was assigned a CVE ID without any significant cross-checking.

I've filed a CVE rejection request with MITRE, since they're the CVE CNA in this case. If they successfully reject it, getting it removed from GHSA should also be easy.

@woodruffw
Copy link

Actually, looks like I'll be able to propose a GHSA withdrawal even if the CVE itself hasn't been retracted. I'll open a PR for that in a moment.

@woodruffw
Copy link

I've filed github/advisory-database#762 to mark the GHSA report as withdrawn.

@jenstroeger
Copy link

jenstroeger commented Oct 20, 2022

@The-Compiler agreeable and funny #287 (comment) 😁

@woodruffw thanks for following up on the advisory! That means that pip-audit won’t pick up that CVE any longer once pypa/pip-audit#385 has merged, correct?

@woodruffw
Copy link

@woodruffw thanks for following up on the advisory! That means that pip-audit won’t pick up that CVE any longer once pypa/pip-audit#385 has merged, correct?

Once that's merged and the OSV entry is marked as withdrawn, yeah.

@Denelvo
Copy link

Denelvo commented Oct 21, 2022

@The-Compiler @bluetech @woodruffw & other maintainers: don't sweat it! Even though 18 years ago, someone wrote a regex, not thinking of 2022-levels of security paranoia, your efforts are still very much appreciated and your contributions are incredibly productive.

The CVE did trigger some alarms in builds of valuable systems used by financial institutions. But that's good. It doesn't necessarily mean that pytest is suddenly unusable. It just forces downstream developers in highly secured places to stop and think. Alerts can be ignored, assessed, categorized, accepted,...

Please look at CVEs as mostly just a catalyst. Having the CVE removed is a harsh reaction, but the right one if we're absolutely sure that that particular regex is not exploitable. But try to think of it from the point of view of a security officer, with full paranoia goggles on (and no intimate knowledge of the module). The regex is present in the code. It could be executed. A hacker could penetrate, set up a SVN repo, craft some commands, take advantage of other existing vulnerabilities on the system,... You have to think worst case and then some.

In the end, I think a correctly worded CVE with a low severity score which clearly includes all the IFs, would have been the best option here.

Thanks!

@The-Compiler
Copy link
Member

Taking the freedom to quote what I wrote over at github/advisory-database#762 (comment):

but to my eye the advisory is in fact talking about a real redos vulnerability. I do understand the annoyance with what you call CVE spam, but if the advisory is valid then we want to include it in our data set.

In theory, when viewed in a vacuum: indeed.

However, even a very broad search on GitHub for the affected code yields 92 results. I've looked through them all, and from what I can tell, all the matches fall into one of those categories:

  • Copies of py using it (in svnurl(), which the search captures as well)
  • Copies of pypy (where the py library originates from historically), in it's development tools directory. However, PyPy has moved away from SVN in 2010, so they're probably just bitrotting. In any case, they use the repository the current file is in, or are hardcoded to run on a specific old PyPy team server (codespeak.net). If someone has control over that, they might as well just change those scripts. Even if there are certain scripts which actually run on arbitrary SVN servers, the context they are run in is unlikely to make a DoS a real problem. But given that they won't actually work anymore for their intended original purpose, the chance of anyone running them is pretty much zero.

On the other side of the coin, you have at least half a million projects depending on pylib via pytest, which got noise in their inbox.

I believe the main point of a CVE and security advisories is to make people aware of problems which have a real (even if small) chance of affecting them. An advisory which just adds noise to what I believe will be 100.0% of the receivers is just going to hurt the whole system and community.

In this particular case, and when viewed in context, the sheer amount of noise the advisory generates vs. its real usage is so high that the word "spam" is unfortunately very fitting. I can't help but think that "oh, I get a shiny CVE in a popular project" was the only motivation behind it.

@woodruffw
Copy link

Just to make sure there's no confusion: I'm not a maintainer of this project or of pytest, so my opinions are not those of the maintainers. I'm the maintainer of a separate dependency scanning project, so I have an interest in dependency feeds having a high signal-to-noise ratio. After this comment, I'll step out of this thread now (since I'm not a maintainer and I only stepped in to coordinate on the pip-audit side).

With that being said: I disagree that it's good that a CVE was filed for this behavior. The fact that one was filed and published seemingly without any review represents a series of communication and authority breakdowns; the fact that the project's own maintainers have done more investigation into exploitability potential than the original reporter seems to have is an indicator of this.

As for why that matters: not everybody is a bank, with roles dedicated to reviewing a constant deluge of security reports. In the context of more limited resources, managing security fatigue is far more important than reporting weakness classes like ReDoS, which don't manifest as exploitable vulnerabilities in the overwhelming majority of cases. When users get tired of useless reports, they disable their security tools entirely.

@SCH227
Copy link
Author

SCH227 commented Oct 24, 2022

First of all, I did fill for a CVE, but I didn't publish it. Someone else did it. I find illogical it was you, but the GH advisory says "Credits - @The-Compiler".
The other thing I can think of, is that GH made it automatically because this is described in a public issue. But I find it strange, because I already reported other security issues to MITRE and they never got published before I informed they were made public

pdodds added a commit to kodexa-ai/kodexa that referenced this issue Jan 13, 2023
…onduct a ReDoS (Regular expression Denial of Service) attack via a Subversion repository with crafted info data, because the InfoSvnCommand argument is mishandled.

The particular codepath in question is the regular expression at py._path.svnurl.InfoSvnCommand.lspattern and is only relevant when dealing with subversion (svn) projects. Notably the codepath is not used in the popular pytest project. The developers of the pytest package have released version 7.2.0 which removes their dependency on py. Users of pytest seeing alerts relating to this advisory may update to version 7.2.0 of pytest to resolve this issue. See pytest-dev/py#287 (comment) for additional context.
picciama added a commit to wilhelm-lab/spectrum_fundamentals that referenced this issue Jan 22, 2023
@ret2libc
Copy link

Is the POC in the description actually working for anyone?

>>> from py._path.svnurl import InfoSvnCommand
>>> payl = "   2256      hpk        165 Nov 24 17:55 __init__.py" + " " * 5000
>>> InfoSvnCommand(payl)

For me it executes everything in a fraction of a second.

@RonnyPfannschmidt
Copy link
Member

It's thinkable that recent python versions have complexity enhancements

sybrenstuvel added a commit to sybrenstuvel/python-rsa that referenced this issue Apr 23, 2023
uJhin added a commit to uJhin/upbit-client that referenced this issue May 21, 2023
Update Python Package Version
- find issue: pytest-dev/py#287
rvelaVenafi added a commit to Venafi/vcert-python that referenced this issue Jun 1, 2023
Ignore Safety ID 51457 dues to a false-positive reported here: pytest-dev/py#287
rvelaVenafi added a commit to Venafi/vcert-python that referenced this issue Jun 1, 2023
Upgrade plugin dependencies to cover security risks
Ignore Safety ID 51457 dues to a false-positive reported here: pytest-dev/py#287
Ignore bandit B113 request timeout issue
stefpiatek added a commit to stefpiatek/bbc_meet_spotify that referenced this issue Jul 26, 2023
beepscore added a commit to beepscore/websearcher that referenced this issue Sep 5, 2023
hanskuhn added a commit to routeviews/google-cloud-storage that referenced this issue Oct 2, 2023
Dependabot flagged 'py' as being vulnerable to a DoS. It likely doesn't affect us, but the recommendation is that 'pytest' removed dependency on 'py' in 7.2.0 so let's do that. 

This is discussed in a thread found here: pytest-dev/py#287 (comment)
hanskuhn added a commit to routeviews/google-cloud-storage that referenced this issue Oct 2, 2023
* Update pytest version to remove py dependency

Dependabot flagged 'py' as being vulnerable to a DoS. It likely doesn't affect us, but the recommendation is that 'pytest' removed dependency on 'py' in 7.2.0 so let's do that. 

This is discussed in a thread found here: pytest-dev/py#287 (comment)

* Remove 'py'

py is a dependency of pytest so can be removed as of pytest version==7.2.0.
pablospe added a commit to pablospe/pybind11 that referenced this issue Oct 11, 2023
The py library through 1.11.0 for Python allows remote attackers to conduct a ReDoS (Regular expression Denial of Service) attack via a Subversion repository with crafted info data, because the InfoSvnCommand argument is mishandled.

The particular codepath in question is the regular expression at py._path.svnurl.InfoSvnCommand.lspattern and is only relevant when dealing with subversion (svn) projects. Notably the codepath is not used in the popular pytest project. The developers of the pytest package have released version 7.2.0 which removes their dependency on py. Users of pytest seeing alerts relating to this advisory may update to version 7.2.0 of pytest to resolve this issue. See pytest-dev/py#287 (comment) for additional context.
rwgk pushed a commit to pybind/pybind11 that referenced this issue Oct 16, 2023
…#4880)

* Update pytest (which removes their dependency on py)

The py library through 1.11.0 for Python allows remote attackers to conduct a ReDoS (Regular expression Denial of Service) attack via a Subversion repository with crafted info data, because the InfoSvnCommand argument is mishandled.

The particular codepath in question is the regular expression at py._path.svnurl.InfoSvnCommand.lspattern and is only relevant when dealing with subversion (svn) projects. Notably the codepath is not used in the popular pytest project. The developers of the pytest package have released version 7.2.0 which removes their dependency on py. Users of pytest seeing alerts relating to this advisory may update to version 7.2.0 of pytest to resolve this issue. See pytest-dev/py#287 (comment) for additional context.

* Added conditions so that we keep using 7.0.0 on python 3.6
mariuz referenced this issue in maxirobaina/django-firebird Nov 2, 2023
fix security issues pytest upgrade to 7.2.0
mattstern31 added a commit to mattstern31/datasets-server-storage-admin that referenced this issue Nov 11, 2023
### Generic implementation of a processing graph

Remove explicit mentions to /splits or /first-rows from code, and move them to the "processing graph":

```json
{
  "/splits": {"input_type": "dataset", "required_by_dataset_viewer": true},
  "/first-rows": {"input_type": "split", "requires": "/splits", "required_by_dataset_viewer": true},
}
```

This JSON (see libcommon.config) defines the *processing steps* (here /splits and /first-rows) and their dependency relationship (here /first-rows depends on /splits). It also defines if a processing step is required by the Hub dataset viewer (used to fill /valid and /is-valid).
A processing step is defined by the endpoint (/splits, /first-rows), where the result of the processing step can be downloaded. The endpoint value is also used as the cache key and the job type.

After this change, adding a new processing step should consist in:
- creating a new worker in the `workers/` directory
- update the processing graph
- update the CI, tests, docs and deployment (docker-compose files, e2e tests, docs, openapi, helm chart)

This also means that the services (API, admin) don't contain any code mentioning directly splits or first-rows. And the splits worker does not contain direct reference to first-rows.

### Other changes

- code: the libcache and libqueue libraries have been merged into libcommon
- the code to check if a dataset is supported (exists, is not private, access can be programmatically obtained if gated) has been factorized and is now used before every processing step and before even accepting to create a new job (through the webhook or through the /admin/force-refresh endpoint).
- add a new endpoint: /admin/cancel-jobs, which replaces the last admin scripts. It's easier to send a POST request than to call a remote script.
- simplify the code of the workers by factorizing some code into libcommon:
  - the code to test if a job should be skipped, based on the versions of the git repository and the worker
  - the logic to catch errors and to write to the cache
  This way, the code for every worker now only contains what is specific to that worker.
 
### Breaking changes

- env vars `QUEUE_MAX_LOAD_PCT`, `QUEUE_MAX_MEMORY_PCT` and `QUEUE_SLEEP_SECONDS` are renamed as `WORKER_MAX_LOAD_PCT`, `WORKER_MAX_MEMORY_PCT` and `WORKER_SLEEP_SECONDS`.

---

* feat: 🎸 add /cache-reports/parquet endpoint and parquet reports

* feat: 🎸 add the /parquet endpoint

* feat: 🎸 add parquet worker

Note that it will not pass the CI because
- the CI token is not allowed to push to refs/convert/parquet (should be
  in the "datasets-maintainers" org)
- the refs/convert/parquet does not exist and cannot be created for now

* ci: 🎡 add CI for the worker

* feat: 🎸 remove the hffs dependency

we don't use it, and it's private for now

* feat: 🎸 change the response format

associate each parquet file with a split and a config (based on path
parsing)

* fix: 🐛 handle the fact that "SSSSS-of-NNNNN" is "optional"

thanks @lhoestq

* fix: 🐛 fill two fields to known versions in case of error

* feat: 🎸 upgrade datasets to 2.7.0

* ci: 🎡 fix action

* feat: 🎸 create ref/convert/parquet if it does not exist

* feat: 🎸 update pytest

See pytest-dev/py#287 (comment)

* feat: 🎸 unlock access to the gated datasets

Gated datasets with extra fields are not supported. Note also that only
one token is used now.

* feat: 🎸 check if the dataset is supported only for existing one

* fix: 🐛 fix config

* fix: 🐛 fix the branch argument + fix case where ref is created

* fix: 🐛 fix logic of the worker, to ensure we get the git sha

Also fix the tests, and disable gated+private for now

* fix: 🐛 fix gated datasets and update the tests

* test: 💍 assert that gated with extra fields are not supported

* fix: 🐛 add controls on the dataset_git_revision

* feat: 🎸 upgrade datasets

* feat: 🎸 add script to refresh parquet response

* fix: 🐛 fix the condition to test if the split exists in list

also: rename functions to be more accurate

* refactor: 💡 use exceptions to make the flow clearer

* feat: 🎸 add processing_steps

* fix: 🐛 fix signature

* chore: 🤖 adapt to poetry 1.2, use pip-audit

* feat: 🎸 use ProcessingStep in api service

* feat: 🎸 use ProcessingStep in admin service

and replace the last scripts with the /cancel-jobs/xxx endpoints.

* style: 💄 fix style

* feat: 🎸 update libcommon (use processing_step)

* refactor: 💡 merge libcache and libqueue into libcommon

* feat: 🎸 upgrade to libcommon 0.4

* feat: 🎸 upgrade to libcommon 0.4

* fix: 🐛 upgrade poetry

* feat: 🎸 use processing_step in workers

* feat: 🎸 implement should_skip_job and process in generic Worker

this will make the code of workers simpler

* feat: 🎸 handle CustomError from the workers, with specific code

* feat: 🎸 simplify compute method

* refactor: 💡 fix typing

* fix: 🐛 remove erroneous control

* feat: 🎸 update libcommon to 0.4.2

* feat: 🎸 update to libcommon 0.4.2

* ci: 🎡 fix ci

* docs: ✏️ fix docstring

* feat: 🎸 update to libcommon 0.4.2

* refactor: 💡 use Mapping instead of Dict

* feat: 🎸 update to libcommon 0.4.2

also: replace Dict with Mapping

* fix: 🐛 use Dict because it must be mutable

* fix: 🐛 missing import

* feat: 🎸 remplace dependency with previous_step and next_steps

* feat: 🎸 define the processing graph in the configuration

* feat: 🎸 upgrade to libcommon 0.5

* feat: 🎸 upgrade to libcommon 0.5

* feat: 🎸 upgrade to libcommon 0.5

* feat: 🎸 upgrade to libcommon 0.5.0

* feat: 🎸 upgrade to libcommon 0.5

* feat: 🎸 upgrade to libcommon 0.5

* refactor: 💡 add logic methods to simplify services and workers

* feat: 🎸 upgrade to libcommon 0.5.1

some tests have been moved (commented yet) to e2e, since it becomes hard
to simulate all the Hub endpoints -> better to test the scenari against
the real Hub instead

* feat: 🎸 upgrade to libcommon 0.5.1

* feat: 🎸 remove parquet processing step

since it's not the scope of this PR

* style: 💄 fix stykle

* ci: 🎡 remove parquet ci

* feat: 🎸 upgrade docker images

* test: 💍 add some tests for the webhook

* test: 💍 update e2e tests (and error messages in openapi)

* style: 💄 fix style

* feat: 🎸 remove parquet code
pjaos pushed a commit to pjaos/ct6_meter_os that referenced this issue Jan 24, 2024
I've removed this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests