Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Hashicorp Vault in a transparent manner #625

Open
alexandernst opened this issue Oct 28, 2018 · 8 comments
Open

Implementing Hashicorp Vault in a transparent manner #625

alexandernst opened this issue Oct 28, 2018 · 8 comments

Comments

@alexandernst
Copy link

I'm currently discussing with other devs at my company using Hashicorp Vault for protecting the files we upload to S3. The way this Vault works makes it perfect for our needs, and probably for anybody else looking for a reliable secure file storage.

The problem is that there is no simple way to implement it on top of django-storages without patching the particular classes in storages.backend that we use (S3 in our case).

We'd like to discuss and contribute a serie of patches that would allow a way in which django-storages could provide some sort of callback mechanism that would allow to "intercept" files that are to be written/read and modify them (read as encrypt/decrypt). Think about is as the pre_* and post_* signals in Django.

Background info:

So, what is that Hashicorp thing and why is it any different from S3?

The main difference is that S3's encryption protects the files only from cold attacks (aka: somebody going inside AWS facilities and getting away with your disk, physically). Hashicorp on the other side, encrypts files before they are uploaded to S3.

Ok, so this is just like any other encryption that can be used with Django, right?

No, the encryption (and decryption) is done inside Vault, instead of being handled by Python/Django, which means that the decryption key can't be stolen and that the attacker must have constant access to the Vault in order to decrypt all files.

How does that Vault work?

First you create a secret key using the Transit engine, then you use the Vault as a SaaS, meaning, you call 2 API endpoints: one to encrypt data and the other to decrypt data.

So how would all that integrate inside django-storage?

The idea is to provide some sort of mechanism that would allow all storage implementation (Azure, GCP, S3, etc...) to call an optional callback/signal when files are about to be written or read and let that signal handle the encryption/decryption.
That callback/signal would then make a GET/POST request to Vault and make sure the files that are being processed are safely encrypted/decrypted.

I hope that this makes sense, but if it doesn't, I'll happily explain any questions that you might have. @jschneier @jdufresne

@alexandernst
Copy link
Author

Anybody?

@jdufresne
Copy link
Contributor

Is it possible to inherit from one of the existing storage backends and then override the necessary methods? I guess I'm wondering if a new mechanism is required at all. Perhaps it would be simpler to add an empty method to act as a hook for child classes.

It will be easier for me to analyze the proposal with some code to look at.

@sww314
Copy link
Contributor

sww314 commented Oct 31, 2018

How about a Proxy-based design where you build a Vault based storage that just hands off the files to S3 (or any of the storages)?

My question would be how and where are you going to handle the decryption? For the most part, the storages return a url for the file to be accessed in the normal file handling flow. The file is not streamed from S3 thru the Django server. Would the URL actually be from the Vault server?

As a side note, if you are not doing client-side encryption I am curious how this is better than what Amazon or Google provide. Google offers the ability to totally manage the keys. https://cloud.google.com/storage/docs/encryption/
Your files are still moving around unencrypted - just to and from a different server.

@alexandernst
Copy link
Author

@jdufresne I was thinking about actually using Django's signals, but perhaps you're right and I should just submit a few patches and then we can discuss if they are feasible.

@sww314 So, your approach is toset DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.HashicorpVaultStorage' and then use another variable to define to which class/storage should the files be proxied?

About your second question: both the encryption and decryption are done inside Hashicorp Vault. The Vault acts like a black box. You send it data and tell it the name of key (a key you previously generated inside the Vault) you want to encrypt the data and the Vault encrypts it. Same goes for the other way around.

About being better that AWS/Google's own solutions: it's not better or worse, it's different.
I have Hashicorp Vault's code and I can ready it whenever I want, and I know exactly what is going on with my secret keys. With AWS/Google, I'l using literally a black box. I'm told it's secure, but how do I know? How do I know they are not making copies of my keys?
There is also the price tag. I host Vault on my own, and I pay just for the EC2 instance itself.

@sww314
Copy link
Contributor

sww314 commented Oct 31, 2018

@alexandernst Yes exactly. Are still planning on using S3 (or similar?) Or are you going to store in your own EC2 instance. From the first description, I thought it was proxy to encrypt/decrypt files as they enter/leave S3, but the comment about hosting your own made me wonder.

Side note: You might not want to use as the DEFAULT_FILE_STORAGE, because if you use collectstatic your static files will be uploaded to the storage provider. This type of setup is probably overkill for storing your logo. Or choose a different solution of your static files.

@alexandernst
Copy link
Author

@sww314 I'm ok hosting my encrypted files in S3, as long as they are encrypted before entering S3.

Good point about the DEFAULT_FILE_STORAGE. Let me rethink how this might be done without affecting static files.

@alexandernst
Copy link
Author

I just created a new PR with a skeleton implementation of my proposal. Can we move the discussion there? @sww314 @jdufresne

@alexandernst
Copy link
Author

Sorry, I forgot to mention that the PR is #627

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants