Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a blog post about privacy preserving ML, with sklearn, federated … #173

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bcm-at-zama
Copy link

@bcm-at-zama bcm-at-zama commented Nov 23, 2023

…learning and fully homomomorphic encryption.

This is a blog post that we have discussed with Francois Goupil (@francoisgoupil I imagine), about privacy, and how scikit-learn can be used in a privacy-preserving way with federated learning and fully homomorphic encryption. More precisely, we had agreed on the abstract with Francois, and here is the full version.

Author of the blog is https://github.com/andrei-stoian-zama @andrei-stoian-zama

PS: by the way, I love what you're doing at scikit-learn. It's so easy to use (and still, powerful), your APIs are really well done. Cheers to the team!

Copy link

welcome bot commented Nov 23, 2023

Thank You Banner

💖 Thanks for opening this pull request! 💖
scikit-learn community really appreciates your time and effort to contribute to the project.
Please make sure you have read our Contributing Guidelines and filled in our pull request template to the best of your ability.

…learning and fully homomomorphic encryption.
@glemaitre
Copy link
Member

FYI: this is on my TODO list to review the blog post. I might be a bit busy next week but I'll do my best.

@bcm-at-zama
Copy link
Author

Happy new year everyone! Do you know when this blog may be merged, please? Thank you

@glemaitre
Copy link
Member

So I had a read yesterday on the post. I think that in general this is looking good. We need to figure a couple of things out before the merge:

  • I would like the blog post close to a notebook and thus get the output. I have to try to build the website locally to be sure of the rendering and check that we can put some static output that look OK.
  • I think that we should have from the start something allowing to reproduce the experiment via a requirement of environment file. I want to find more time to be able to repeat the experiment. I saw that you already link to a folder on a GitHub repository but I feel the content in the repository diverged a bit too much. I am wondering if we could centralize in some sort of single notebook. I don't think that something need to be done right now. I want first play with the code to have a better grasp of what one would expect when reading and trying to reproduce.
  • While reading, I thought that the part playing with the flower server was missing a bit of content to show the context to somebody that never heard about federated learning. Somehow, I would like that we make it explicit within the example that the person developing the model should not see the data. I don't have yet a good proposal.

@bcm-at-zama So this is just to let you know that the PR is not dead but we have been a bit busy with the end of the year vacation and the ongoing scikit-learn release.

@bcm-at-zama
Copy link
Author

Thanks @glemaitre , great to hear. So, if you want some modifications in our blog post (including, us trying to package a bit differently, like with a single notebook), you'll tell @andrei-stoian-zama, who is the real author of the content. Anyway, thanks a lot guys!

@bcm-at-zama
Copy link
Author

Hello. It would be awesome if we could merge this blog post sooner than later. We are about to present this work at Flower event, https://flower.dev/conf/flower-summit-2024/, for example, and so we're certainly going to make some (social network) promotion before. If we have the blog, we can link to it; else, we'll go for the notebook on our repo, but it's much less readable for new comers.

@glemaitre
Copy link
Member

If we have the blog, we can link to it; else, we'll go for the notebook on our repo, but it's much less readable for new comers.

I understand your point. Unfortunately, we currently don't have too much bandwidth with other priorities (such as the upcoming 1.4.1 release). I personally don't want to merge something just for the sake of merging it.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts on this post:

  • We should first talk about what FHE is, a tiny bit, not deep, but better than

FHE is a technology that enables application providers to build cloud-based applications that preserve user privacy

FHE and the cloud don't have much to do with one another. You can use FHE in the cloud, but you could also use it in a local private network between two stakeholders.

  • We shouldn't link to Zama's example, that example with the should be here. This is not a sponsored post, and we don't do sponsored posts. Links to the libraries and explanation of what's happening is okay and needed of course.

  • I would like a better explanation of what .compile does. I don't mind the link to the repo since it's BSD, but need more context here.

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

mnist_dataset = fetch_openml("mnist_784")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use MNIST for this demo? It feels like a very general dataset and I'm wondering if there's a better dataset for the point that you're making. Maybe something where it is clear that privacy could be at stake?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is for the author, @andrei-stoian-zama

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved from MNIST that you didn't like to Breast Cancer, which is indeed a better dataset for PPML

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with you to resolve this conversation?


model = LogisticRegression(penalty="l2")
model.fit(X=x_train, y=y_train)
model.compile(x_train)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the internals of this .compile step? Could you share a diagram with a brief explainer of what might go wrong if one doesn't run that line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok we'll explain: if one does not compile, the model stays in the clear, no FHE :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it sufficient now (with in particular an image)? Else, we can link to https://docs.zama.ai/concrete-ml/advanced-topics/compilation#compilation-to-fhe, but it's more complicated and maybe a bit out of scope for this blog: your choice, you tell me

@bcm-at-zama
Copy link
Author

Thanks for the reviews / comments, we'll soon have a look.

For this one in particular, @adrinjalali:

We shouldn't link to Zama's example, that example with the should be here.

Do you mean we need to add a copy (of the notebook currently on Concrete ML repo) in scikit-learn repository? if yes, yes sure, could you tell us where we need to add the notebook, exactly?

@bcm-at-zama
Copy link
Author

If we have the blog, we can link to it; else, we'll go for the notebook on our repo, but it's much less readable for new comers.

I understand your point. Unfortunately, we currently don't have too much bandwidth with other priorities (such as the upcoming 1.4.1 release). I personally don't want to merge something just for the sake of merging it.

And I understand your point. It's your repo and your blog, at the end. We had discussed the interest of this blog for both Concrete ML and scikit-learn (and Flower, by the way), a long time ago with @francoisgoupil, but of course, I understand you have other priorities as well

@adrinjalali
Copy link
Member

Do you mean we need to add a copy (of the notebook currently on Concrete ML repo) in scikit-learn repository? if yes, yes sure, could you tell us where we need to add the notebook, exactly?

Ideally the blogpost is self containing in terms of the content. So the code can be added here.

@bcm-at-zama
Copy link
Author

Do you mean we need to add a copy (of the notebook currently on Concrete ML repo) in scikit-learn repository? if yes, yes sure, could you tell us where we need to add the notebook, exactly?

Ideally the blogpost is self containing in terms of the content. So the code can be added here.

I see. We'll see what we can do, but I'm afraid it might be a lot of lines. Me, as a reader, I like to have ready to go code to download, when I read a blog, instead of having to concatenate all the code blocks of the blog (+ having to fix what was actually missing)

@bcm-at-zama
Copy link
Author

New version has been dropped, with changes to address the comments, hopefully. (I'm going to answer to the individual comments)

@bcm-at-zama
Copy link
Author

bcm-at-zama commented Feb 6, 2024

Now:

@bcm-at-zama
Copy link
Author

@adrinjalali, I hope it's better for you now. If not and you prefer us to make a call about that, it's doable, I'm in France

@bcm-at-zama
Copy link
Author

Hello, we present this work tomorrow at Flower event, https://flower.ai/conf/flower-ai-summit-2024/.

Do you have any ETA for merging this blog post? I understand you have other priorities, but it has really been a long time since we submitted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants