Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of yaml.load instead of yaml.safe_load #1529

Closed
tonybaloney opened this issue Jul 3, 2018 · 4 comments
Closed

Use of yaml.load instead of yaml.safe_load #1529

tonybaloney opened this issue Jul 3, 2018 · 4 comments
Milestone

Comments

@tonybaloney
Copy link

Hi,

I noticed that the yaml module is using load instead of safe_load. It's generally recommended to use safe_load as otherwise, attackers can run arbitrary code through configuration files.
See this OpenStack file
https://security.openstack.org/guidelines/dg_avoid-dangerous-input-parsing-libraries.html

Happy to raise a PR and swap those with yaml.safe_load in

return yaml.load(source, Loader)

@waylan
Copy link
Member

waylan commented Jul 3, 2018

Your reference states:

You always want to use the safe functions to load input.

I have to disagree with that blanket statement. Yes, many (most?) times that is true, But sometimes you might actually want/need the so-called "unsafe" option (otherwise, why would the library offer it).

In any event, there are at least two questions we need to answer:

  1. Is there a legitimate need for not using safe_load with MkDocs?

    For some users, yes, See this comment to Non ascii headers do not slugify nicely. #212 as an example. Although, as mentioned elsewhere in that discussion, we may not actually want to allow that. And it is likely possible to get the same functionality using a plugin rather than pointing to a Python object directly from the YAML config file.

    So, in the end, no there is no "need." In fact, we don't officially support doing that. But for some advanced users it is "nice to have."

  2. As MkDocs is a static site generator, what danger, if any, really exists?

    Again this was discussed in Non ascii headers do not slugify nicely. #212 and we don't see it as a high concern. As a reminder, the "server" that we include with MkDocs is a dev server and not intended for use in a server open to the world. All our users ever upload to a server is a bunch of flat files. What exactly would the attack vector be here?

    Worst case, an attacker adds some code calling option into the config file and that code adds some evil JavaScript code to the generated site. But if the attacker can do that, then they can just add the JavaScript code into the source files for the site directly. No need to use the config file. At most, using the config file adds a level of indirection if the attacker is trying to hide their activities, but things are not actually any more or less safe than if we use safe_load.

    Yes, I realize that the attacker could simply use this as a method to get arbitrary code to run on the the developer's machine (when she runs mkdocs build). But even if we did use safe_load, we still support plugins. So any plugin referenced in the config will call code which could be from an unknown source. How is that any different that not using safe_load?

@tonybaloney
Copy link
Author

ok, I thought it worth raising the issue. I see your point with the plugins

I was thinking along the lines of shell access being gained through the configuration file, but as you state you need shell access anyway to compile it.

I was thinking of a scenario like this:

  • MkDocs being run on a cron/hooks on a server, linked to a CI/CD process. Developers change pages and MkDocs builds the content via source-control. This is a pretty common deployment scenario
  • Someone in the team had weak credentials (again really common), attacker pushes code in mkdocs.yaml to create a shell account and pops server.
!!python/object/apply:os.system\nargs: ['useradd bad_user ...']

That scenario is completely the fault of the administrator for doing this and for the dev for having bad creds, but they might not be aware what harm can be done simply by deserialising a yaml file.

I'll close the issue anyway.

@waylan
Copy link
Member

waylan commented Feb 28, 2019

With the upcoming release of PyYAML 5.1, we should update to no longer call the soon-to-be deprecated yaml.load(). See yaml/pyyaml#257 (and https://msg.pyyaml.org/load) for details and the options available. I expect we will want to use FullLoader. But I need to confirm that we don't loose anything we need from UnsafeLoader.

@waylan waylan reopened this Feb 28, 2019
@waylan waylan added this to the 1.1 milestone Feb 28, 2019
@waylan
Copy link
Member

waylan commented Mar 1, 2019

I have reviewed the changes to PyYAML and MkDocs use of the library and there are not any changes that we need to make. PyYAML 5.1 raises a deprecation warning only when a loader isn't explicitly selected. However, we use our own custom loader so we always explicitly select a loader. The yaml.load function itself is not deprecated; in fact the loader argument of that function is how you select your Loader of choice.

The only possible change to the code we could make is to have our custom loader subclass yaml.FullLoader rather than yaml.Loader. But that seems unnecessary as the two are aliases of each other in 5.1. And yaml.FullLoader didn't exist in earlier versions. Therefore, such a change would require us to upgrade our minimum dependency requirement to version 5.1.

For the reasons stated in an earlier comment, the benefits of not calling the so-called "safe" version of PyYAML outweigh the risks for MkDocs. Therefore, there is no need or benefit to requiring version 5.1 at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants