Demo of federated learning using NVFlare #7879

rongou · 2022-05-09T22:04:25Z

A simple demo of federated learning using NVFlare.

Part of #7778

rongou · 2022-05-09T22:04:53Z

trivialfis

Something weird happened when I was trying to install nvflare:

$ pip install nvflare
Collecting nvflare
  Using cached nvflare-2.0.16-py3-none-any.whl (797 kB)
Requirement already satisfied: numpy in /home/jiaming/.anaconda/envs/xgboost_dev/lib/python3.10/site-packages (from nvflare) (1.21.6)
Collecting grpcio
  Using cached grpcio-1.46.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB)
Collecting nvflare
  Using cached nvflare-2.0.15-py3-none-any.whl (797 kB)
  Using cached nvflare-2.0.14-py3-none-any.whl (814 kB)
  Using cached nvflare-2.0.13-py3-none-any.whl (801 kB)
  Using cached nvflare-2.0.12-py3-none-any.whl (799 kB)
  Using cached nvflare-2.0.11-py3-none-any.whl (788 kB)
  Using cached nvflare-2.0.10-py3-none-any.whl (781 kB)
  Using cached nvflare-2.0.9-py3-none-any.whl (781 kB)
  Using cached nvflare-2.0.8-py3-none-any.whl (776 kB)
  Using cached nvflare-2.0.7-py3-none-any.whl (776 kB)
  Using cached nvflare-2.0.6-py3-none-any.whl (776 kB)
  Using cached nvflare-2.0.5-py3-none-any.whl (771 kB)
  Using cached nvflare-2.0.4-py3-none-any.whl (767 kB)
  Using cached nvflare-2.0.3-py3-none-any.whl (762 kB)
  Using cached nvflare-2.0.2-py3-none-any.whl (753 kB)
  Using cached nvflare-2.0.1-py3-none-any.whl (418 kB)
  Using cached nvflare-2.0.0-py3-none-any.whl (418 kB)
  Using cached nvflare-1.0.2-py3-none-any.whl (510 kB)
  Downloading nvflare-1.0.1-py3-none-any.whl (510 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 481.3/510.5 KB 551.6 kB/s eta 0:00:01

And then when I try to install a specific version:

pip install nvflare==2.0.15
Collecting nvflare==2.0.15
  Using cached nvflare-2.0.15-py3-none-any.whl (797 kB)
ERROR: Could not find a version that satisfies the requirement tenseal==0.3.0 (from nvflare) (from versions: none)
ERROR: No matching distribution found for tenseal==0.3.0

So, I haven't tried the demo yet and don't know how it actually works. The code looks good to me.

RAMitchell

What is the benefit of using nvflare here vs. the previous example?

Just trying to wrap my head around what each component is doing.

RAMitchell · 2022-05-11T14:24:25Z

demo/nvflare/custom/controller.py

+
+    def start_controller(self, fl_ctx: FLContext):
+        self._server = multiprocessing.Process(
+            target=xgboost.federated.run_federated_server,


Is it possible to create a grpc server at the python layer?

Yes we could write the grpc server in python, but it might have some limitations when it comes to threading. We are still talking with the nvflare team to figure out the details, so this could change in the future.

RAMitchell · 2022-05-11T14:32:19Z

demo/nvflare/custom/trainer.py

+        xgb.rabit.init([e.encode() for e in rabit_env])
+
+        # Load file, file will not be sharded in federated mode.
+        dtrain = xgb.DMatrix('agaricus.txt.train')


So this is not using the split data?

In prepare_data.sh we copy each split into the site-specific directory. This probably looks more like the real federated environment.

rongou · 2022-05-11T16:17:05Z

@trivialfis nvflare seems to have some issue with python 3.9/3.10. I had to specify 3.8 for it to work. Added a note to the readme.

rongou

@RAMitchell for now this is a pretty "shallow" integration, but nvflare can still provide support for managing the federated environment. We are working with the nvflare team to figure out how to get tighter integration for better privacy etc.

rongou · 2022-05-11T16:22:27Z

demo/nvflare/custom/controller.py

+
+    def start_controller(self, fl_ctx: FLContext):
+        self._server = multiprocessing.Process(
+            target=xgboost.federated.run_federated_server,


Yes we could write the grpc server in python, but it might have some limitations when it comes to threading. We are still talking with the nvflare team to figure out the details, so this could change in the future.

rongou · 2022-05-11T16:24:01Z

demo/nvflare/custom/trainer.py

+        xgb.rabit.init([e.encode() for e in rabit_env])
+
+        # Load file, file will not be sharded in federated mode.
+        dtrain = xgb.DMatrix('agaricus.txt.train')


In prepare_data.sh we copy each split into the site-specific directory. This probably looks more like the real federated environment.

demo/nvflare/README.md

trivialfis · 2022-05-12T03:15:53Z

demo/nvflare/custom/trainer.py

+            f'federated_client_key={self._client_key_path}',
+            f'federated_client_cert={self._client_cert_path}'
+        ]
+        xgb.rabit.init([e.encode() for e in rabit_env])


Can we move the RabitContext from dask module to rabit module and reuse it here?

Done. But this changes the class name in dask. Is that what we want? Maybe we can keep the same name RabitContext in the dask module?

Sounds good. I made the change to your PR.

rongou added 7 commits May 5, 2022 23:44

federated learning demo using nvflare

ae2c2c5

use the same name for data files

a0a9ee0

print eval in both sites

73d687e

clean up readme

4134649

fix nvflare link

ac2504f

Merge remote-tracking branch 'upstream/master' into nvflare-demo

90aeee2

cleanup docs

d8ffca3

ShvetsKS mentioned this pull request May 10, 2022

Partition optimization #7208

Open

trivialfis reviewed May 11, 2022

View reviewed changes

RAMitchell reviewed May 11, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/master' into nvflare-demo

a310ddd

note on nvflare python version

2ef1230

rongou commented May 11, 2022

View reviewed changes

trivialfis reviewed May 12, 2022

View reviewed changes

trivialfis added this to 2.0 In Progress in 2.0 Roadmap via automation May 12, 2022

rongou added 2 commits May 12, 2022 11:05

Merge remote-tracking branch 'upstream/master' into nvflare-demo

cda25f4

move RabitContext to rabit.py

5e8b12f

trivialfis approved these changes May 13, 2022

View reviewed changes

Change back the name.

57713cc

RAMitchell approved these changes May 14, 2022

View reviewed changes

trivialfis merged commit af907e2 into dmlc:master May 14, 2022

2.0 Roadmap automation moved this from 2.0 In Progress to 2.0 Done May 14, 2022

trivialfis removed this from 2.0 Done in 2.0 Roadmap Sep 28, 2022

trivialfis added this to In progress in 1.7 Roadmap via automation Sep 28, 2022

trivialfis moved this from In progress to Done in 1.7 Roadmap Sep 28, 2022

rongou deleted the nvflare-demo branch November 18, 2022 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo of federated learning using NVFlare #7879

Demo of federated learning using NVFlare #7879

rongou commented May 9, 2022

rongou commented May 9, 2022

trivialfis left a comment •

edited

RAMitchell left a comment

RAMitchell May 11, 2022

rongou May 11, 2022

RAMitchell May 11, 2022

rongou May 11, 2022

rongou commented May 11, 2022

rongou left a comment

rongou May 11, 2022

rongou May 11, 2022

trivialfis May 12, 2022

rongou May 12, 2022

trivialfis May 13, 2022

Demo of federated learning using NVFlare #7879

Demo of federated learning using NVFlare #7879

Conversation

rongou commented May 9, 2022

rongou commented May 9, 2022

trivialfis left a comment • edited

Choose a reason for hiding this comment

RAMitchell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rongou commented May 11, 2022

rongou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trivialfis left a comment •

edited