Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding rule and test case of python PyYaml FULL_LOAD #2352

Closed
wants to merge 2 commits into from
Closed

Adding rule and test case of python PyYaml FULL_LOAD #2352

wants to merge 2 commits into from

Conversation

shivankar-madaan
Copy link
Contributor

@shivankar-madaan shivankar-madaan commented Aug 25, 2022

Hello Team,
The pyyaml FULL_LOAD is vulnerable to code execution as seen here - yaml/pyyaml#386
I did not find this rule in Semgrep hence added it

Hello Team,
   The pyyaml is vulnerable to code execution as seen here - yaml/pyyaml#386
I did not find this rule in Semgrep hence added it
@CLAassistant
Copy link

CLAassistant commented Aug 25, 2022

CLA assistant check
All committers have signed the CLA.

@minusworld
Copy link
Member

Hey @shivankar-madaan! 👋 Thanks for the PR. 😄

This PR actually prompted me to investigate which exploit paths are still relevant today. The link you provided looks like the vulnerability was patched, so I looked around for the cases which are still exploitable.

I found this blog which very helpfully spells out many of the still-valid exploit cases. Here's me trying them out:

Python 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> x = "!!python/object/new:os.system [echo EXPLOIT!]"
>>> yaml.load(x)
<stdin>:1: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/constructor.py", line 55, in construct_document
    data = self.construct_object(node)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/constructor.py", line 427, in construct_undefined
    raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/new:os.system'
  in "<unicode string>", line 1, column 1:
    !!python/object/new:os.system [e ...
    ^
>>> yaml.unsafe_load(x)
EXPLOIT!
0
>>> yaml.load(x, Loader=yaml.Loader)
EXPLOIT!
0
>>> yaml.load(x, Loader=yaml.UnsafeLoader)
EXPLOIT!
0
>>> yaml.load(x, Loader=yaml.FullLoader)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/constructor.py", line 55, in construct_document
    data = self.construct_object(node)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yaml/constructor.py", line 427, in construct_undefined
    raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/new:os.system'
  in "<unicode string>", line 1, column 1:
    !!python/object/new:os.system [e ...
    ^
>>> yaml.load_all(x, Loader=yaml.UnsafeLoader)
<generator object load_all at 0x7fecbe1d5820>
>>> for o in _:
...  print(o)
...
EXPLOIT!
0
>>>

@minusworld
Copy link
Member

It looks like things are much safer now, and the exploitable cases include:

  • using unsafe_load()
  • using Loader or UnsafeLoader

I'll have to update the rule accordingly.

@shivankar-madaan
Copy link
Contributor Author

Thanks @minusworld for sharing this technique and investigation. Out of curiosity, so in case somebody was using an old version of PyYAML and using the vulnerable full_load, Semgrep does not take that into consideration right?

@minusworld
Copy link
Member

🤔 It's kinda tricky, because we want to make sure we're up-to-date with the latest implementation of PyYAML so that we don't report false positives. But also, a quick search on GitHub shows that many projects are not up-to-date.

Looks like the safe default changes were added in version 5.4, and the search shows many projects not up to 5.4 yet.

@minusworld
Copy link
Member

Let me talk with some of my team to identify a good path forward. Some options include:

  • Releasing a companion rule which checks for out-of-date PyYAML versions in requirements.txt
  • Having two rules with version specifier clauses for old versions of PyYAML and the new ones.

@minusworld
Copy link
Member

Hmm, maybe it was actually deprecated on version 5.1. I'm curious to do some more testing on versions.

@shivankar-madaan
Copy link
Contributor Author

complicated scenario haha....
Interested in what the decision would be from the team

let me know if I can help in the PR. More than happy to contribute back

@minusworld
Copy link
Member

minusworld commented Aug 31, 2022

Here's my analysis of which APIs are vulnerable in which versions.
✅ - safe
❌ - NOT safe
⬛ - unavailable in this version

image

I'm thinking of writing this up in a short blog post, actually. It will include details on how the tests were done alongside the matrix above. @shivankar-madaan, would you like to be mentioned? Would you like to be a reviewer? 😄

As for this rule, we think we will update this rule to only alert on vulnerable cases after PyYAML v5.4. Further analysis of the usage of PyYAML as a dependency showed that most people are on v5.4 or later. In addition, v5.4 was released 1.5 years ago, which is probably enough time for many projects to upgrade. Lastly, the API differences between unsafe and safe versions are pretty minimal.

Therefore, this rule would detect unsafe_load, Loader, UnsafeLoader, and their C variants. Alongside the blog, we think this will encourage people to upgrade to 5.4 or later, which is definitely the best option.

@shivankar-madaan
Copy link
Contributor Author

Hello @minusworld

First of all really nice analysis and a good decision and big thanks for sharing this. For sure I would love to mentioned in the blog and hopefully can help reviewing.Let me know how I can be involved in the process. Also I'll get back with the updated PR for the following variants you mentioned.

@shivankar-madaan
Copy link
Contributor Author

Hello @minusworld

Could you guide me on how we can

Releasing a companion rule which checks for out-of-date PyYAML versions in requirements.txt

Is this possible, I would really like to have it in this way. If not, I will go by the above discussion as decided.

@minusworld
Copy link
Member

There are two things I can think of:

  1. You can try using https://semgrep.dev/docs/experiments/r2c-internal-project-depends-on/, but be aware that this only works if a Pipfile.lock is present. It doesn't work on arbitrary requirements.txt.
  2. You can try to write a rule checking for out-of-date dependencies in requirements.txt using the generic language or a regular expression. If you write this rule, we'll try to scan at scale. If it produces too many false positives, we'll have to consider not adding it to the public registry. (That said, you can always host it yourself somewhere if you want to! :D )

@shivankar-madaan shivankar-madaan closed this by deleting the head repository Sep 9, 2022
@minusworld
Copy link
Member

@shivankar-madaan If you email me at my r2c email, I can send you a draft of the blog for review :D . It's in the git history

@shivankar-madaan
Copy link
Contributor Author

@minusworld just emailed you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants