New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a blog post about privacy preserving ML, with sklearn, federated … #173
base: main
Are you sure you want to change the base?
Conversation
💖 Thanks for opening this pull request! 💖 |
…learning and fully homomomorphic encryption.
FYI: this is on my TODO list to review the blog post. I might be a bit busy next week but I'll do my best. |
Happy new year everyone! Do you know when this blog may be merged, please? Thank you |
So I had a read yesterday on the post. I think that in general this is looking good. We need to figure a couple of things out before the merge:
@bcm-at-zama So this is just to let you know that the PR is not dead but we have been a bit busy with the end of the year vacation and the ongoing scikit-learn release. |
Thanks @glemaitre , great to hear. So, if you want some modifications in our blog post (including, us trying to package a bit differently, like with a single notebook), you'll tell @andrei-stoian-zama, who is the real author of the content. Anyway, thanks a lot guys! |
Hello. It would be awesome if we could merge this blog post sooner than later. We are about to present this work at Flower event, https://flower.dev/conf/flower-summit-2024/, for example, and so we're certainly going to make some (social network) promotion before. If we have the blog, we can link to it; else, we'll go for the notebook on our repo, but it's much less readable for new comers. |
I understand your point. Unfortunately, we currently don't have too much bandwidth with other priorities (such as the upcoming 1.4.1 release). I personally don't want to merge something just for the sake of merging it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts on this post:
- We should first talk about what FHE is, a tiny bit, not deep, but better than
FHE is a technology that enables application providers to build cloud-based applications that preserve user privacy
FHE and the cloud don't have much to do with one another. You can use FHE in the cloud, but you could also use it in a local private network between two stakeholders.
-
We shouldn't link to Zama's example, that example with the should be here. This is not a sponsored post, and we don't do sponsored posts. Links to the libraries and explanation of what's happening is okay and needed of course.
-
I would like a better explanation of what
.compile
does. I don't mind the link to the repo since it's BSD, but need more context here.
from sklearn.datasets import fetch_openml | ||
from sklearn.model_selection import train_test_split | ||
|
||
mnist_dataset = fetch_openml("mnist_784") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use MNIST for this demo? It feels like a very general dataset and I'm wondering if there's a better dataset for the point that you're making. Maybe something where it is clear that privacy could be at stake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is for the author, @andrei-stoian-zama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We moved from MNIST that you didn't like to Breast Cancer, which is indeed a better dataset for PPML
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with you to resolve this conversation?
|
||
model = LogisticRegression(penalty="l2") | ||
model.fit(X=x_train, y=y_train) | ||
model.compile(x_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the internals of this .compile
step? Could you share a diagram with a brief explainer of what might go wrong if one doesn't run that line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok we'll explain: if one does not compile, the model stays in the clear, no FHE :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it sufficient now (with in particular an image)? Else, we can link to https://docs.zama.ai/concrete-ml/advanced-topics/compilation#compilation-to-fhe, but it's more complicated and maybe a bit out of scope for this blog: your choice, you tell me
Thanks for the reviews / comments, we'll soon have a look. For this one in particular, @adrinjalali:
Do you mean we need to add a copy (of the notebook currently on Concrete ML repo) in scikit-learn repository? if yes, yes sure, could you tell us where we need to add the notebook, exactly? |
And I understand your point. It's your repo and your blog, at the end. We had discussed the interest of this blog for both Concrete ML and scikit-learn (and Flower, by the way), a long time ago with @francoisgoupil, but of course, I understand you have other priorities as well |
Ideally the blogpost is self containing in terms of the content. So the code can be added here. |
I see. We'll see what we can do, but I'm afraid it might be a lot of lines. Me, as a reader, I like to have ready to go code to download, when I read a blog, instead of having to concatenate all the code blocks of the blog (+ having to fix what was actually missing) |
New version has been dropped, with changes to address the comments, hopefully. (I'm going to answer to the individual comments) |
Now:
|
@adrinjalali, I hope it's better for you now. If not and you prefer us to make a call about that, it's doable, I'm in France |
Hello, we present this work tomorrow at Flower event, https://flower.ai/conf/flower-ai-summit-2024/. Do you have any ETA for merging this blog post? I understand you have other priorities, but it has really been a long time since we submitted |
…learning and fully homomomorphic encryption.
This is a blog post that we have discussed with Francois Goupil (@francoisgoupil I imagine), about privacy, and how scikit-learn can be used in a privacy-preserving way with federated learning and fully homomorphic encryption. More precisely, we had agreed on the abstract with Francois, and here is the full version.
Author of the blog is https://github.com/andrei-stoian-zama @andrei-stoian-zama
PS: by the way, I love what you're doing at scikit-learn. It's so easy to use (and still, powerful), your APIs are really well done. Cheers to the team!