Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-95778: CVE-2020-10735: Prevent DoS by very large int() #96499

Merged
merged 45 commits into from Sep 2, 2022

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Sep 2, 2022

Integer to and from text conversions via CPython's bignum int type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

Signed-off-by: Christian Heimes [Red Hat] christian@python.org
Tons-of-polishing-up-by: Gregory P. Smith [Google] greg@krypto.org
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

I wrote up a one pager for the release managers. Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.

Further Discussion

... is taking place in discuss.python.org threads

remaining TODOs (aka project management)

@tiran
Copy link
Member

tiran commented Sep 2, 2022

Closing and re-opening ticket. CI isn't starting.

@tiran tiran closed this Sep 2, 2022
@tiran tiran reopened this Sep 2, 2022
tiran and others added 18 commits September 1, 2022 22:21
The text to int parser of Python's int type is not safe against
malicious input. Very large input strings with hundred thousands of
digits can consume several seconds.

The int() now limit the maximum amount of an input string to
5,000 digits.

For comparison total amount of protons in the observable universe is
known as Eddington number. That number has 80 digits.

Signed-off-by: Christian Heimes <christian@python.org>
…e json module.

"It's advised to limit the input to a sensible length." isn't very helpful for the JSON module as, while technically true, the limit needed to avoid hitting things like the int<->str base 10 conversion this issue is about would make a significant percentage of actual JSON used in the real world impractical.

we're limiting the int/str conversion length, that is the best that can be done here - unless someone wants to implement a feature request for JSON to reject int looking fields at a much smaller base10 length limit.
The lumps way too many changes together in one commit, but extricating
them into a series of individual pieces doesn't seem worthwhile at this
stage, it could be done by picking and choosing into a new branch if
there is a reason to do so.  Summarizing off the top of my head after
rereading diffs too many times:

- Renamed the environment variable, -X flag, and names of the new APIs.
- Cleaned up the documentation.
- Refactored and improved some tests.
- Moved the definitions of the default and threshold to a non-public
  header.
- Set the default to 2000 instead of disabled as the Steering Council
  agreed having a default limit made the most sense for our users.
- Lowered the minimum threshold because there seems no harm in allowing
  people to have a lower limit.
- Fixed the other-bases base-10 digits equivalent estimate to be
  consistent with the actual potential base-10 value instead of wildly
  off. Aimed for consistency of number size here rather than any attempt
  to match the performance between bases as that is easier for users to
  understand and doesn't change if our algorithm implementations change.
- Left a couple of notes about what changes look good or not good for a
  security fix backported into a stable release.  It appears the PyConfig
  struct is a public API so we cannot change its definition mid-release.
  Backporting is going to involve something outside of the normal code
  path wise to plumb the new value through.  Ugh.

Questions:
 * [ ] `test_embed` is failing in an odd manner... what's up with that?
 * [ ] Is `_pydecimal` important?  This change makes some of its APIs
   not work based on the values being used.  The unittest suite that
   tests both the C and Python Decimal implementations revealed this
   and needed a higher limit set.  If it relies on ints for arbitrary
   precision Decimal numbers, limits are frequently going to make it
   practically unusuable.
 * [ ] Backporting... we should produce what we believe a 3.11/3.10/3.9
   backport will look like in its own PR branch before making a final
   decision. The most annoying difficulty being the `struct PyConfig`
   issue.
 * [ ] I pondered having values >0 but <threshold just silently be
   rounded up to the threshold value instead of causing an error at
   Python startup time, but we do have other environment variables and
   flags that cause startup time errors based on their value so it
   doesn't seem to matter either way.  If we did that, the setter
   should also do that instead of raising.
Renamed to `int_max_str_digits` and simplified the logic per @Y1hgs's
comments on the earlier revision.  Less code, less awkward, and simpler
to explain.

Underscores and the sign are uncounted because that makes for the
easiest implementation.
Playing it safe, if this lands in 3.11 before 3.11.0 these can be
updated to 3.11.
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with typo fix (as pointed out by Nathaniel on one of the backports)

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
Doc/whatsnew/3.12.rst Outdated Show resolved Hide resolved
Lib/test/test_ast.py Outdated Show resolved Hide resolved
Lib/test/test_compile.py Outdated Show resolved Hide resolved
Parser/pegen.c Outdated Show resolved Hide resolved
Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
tiran and others added 2 commits September 2, 2022 16:55
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
@gpshead
Copy link
Member Author

gpshead commented Sep 2, 2022

LOL at least I managed a consistent misspelling?

@gpshead gpshead merged commit 511ca94 into python:main Sep 2, 2022
gpshead added a commit that referenced this pull request Sep 2, 2022
)

Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports #96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
gpshead added a commit that referenced this pull request Sep 2, 2022
)

Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports #96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
Copy link

@light-bringer light-bringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good changes

@septatrix
Copy link

@gpshead could you maybe link to the respective discussion thread (https://discuss.python.org/t/int-str-conversions-broken-in-latest-python-bugfix-releases/18889/45) directly in the issue? It is already locked and you referred to https://discuss.python.org but given that the linked thread seems to be where the main discussion is happening it might be a bit more accessible and reduce duplicate threads about the same topic.

@gpshead
Copy link
Member Author

gpshead commented Sep 13, 2022

good idea, done, link added.

@ngie-eign
Copy link
Contributor

For others that might be curious, the testcases provided in the PR don't repro the issue on 2.7. I'm not sure about older 3.x versions.
PS I know python 2.7 is unsupported by python.org, but some OS distributions are still using python 2.7 in the field in a supported capacity.

CAM-Gerlach added a commit to CAM-Gerlach/cpython that referenced this pull request Oct 16, 2022
CAM-Gerlach added a commit to CAM-Gerlach/cpython that referenced this pull request Oct 16, 2022
 / pythongh-93306

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
CAM-Gerlach added a commit to CAM-Gerlach/cpython that referenced this pull request Oct 16, 2022
 / pythongh-93306

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
gpshead pushed a commit that referenced this pull request Oct 17, 2022
…main (#98344)

Add int/str security change from issue gh-95778 PRs gh-96499 / gh-95800

Co-authored-by: Gregory P. Smith <greg@krypto.org> [Google]
facebook-github-bot pushed a commit to facebookincubator/cinder that referenced this pull request Jan 20, 2023
Summary:
cherry-picked the upstream 3.10 backport

```
git cherry-pick 8f0fa4b eace09e
```

this one is python/cpython#96537

original commit message below

--------

Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.

This PR comes fresh from a pile of work done in our private PSRT security response team repo.

This backports python/cpython#96499 aka 511ca94

Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).

<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->

I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).

Reviewed By: alexmalyshev

Differential Revision: D39369518

fbshipit-source-id: 6ed2def
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet