Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo of federated learning using NVFlare #7879

Merged
merged 12 commits into from May 14, 2022
Merged

Conversation

rongou
Copy link
Contributor

@rongou rongou commented May 9, 2022

A simple demo of federated learning using NVFlare.

Part of #7778

@rongou
Copy link
Contributor Author

rongou commented May 9, 2022

@RAMitchell @trivialfis

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something weird happened when I was trying to install nvflare:

$ pip install nvflare
Collecting nvflare
  Using cached nvflare-2.0.16-py3-none-any.whl (797 kB)
Requirement already satisfied: numpy in /home/jiaming/.anaconda/envs/xgboost_dev/lib/python3.10/site-packages (from nvflare) (1.21.6)
Collecting grpcio
  Using cached grpcio-1.46.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB)
Collecting nvflare
  Using cached nvflare-2.0.15-py3-none-any.whl (797 kB)
  Using cached nvflare-2.0.14-py3-none-any.whl (814 kB)
  Using cached nvflare-2.0.13-py3-none-any.whl (801 kB)
  Using cached nvflare-2.0.12-py3-none-any.whl (799 kB)
  Using cached nvflare-2.0.11-py3-none-any.whl (788 kB)
  Using cached nvflare-2.0.10-py3-none-any.whl (781 kB)
  Using cached nvflare-2.0.9-py3-none-any.whl (781 kB)
  Using cached nvflare-2.0.8-py3-none-any.whl (776 kB)
  Using cached nvflare-2.0.7-py3-none-any.whl (776 kB)
  Using cached nvflare-2.0.6-py3-none-any.whl (776 kB)
  Using cached nvflare-2.0.5-py3-none-any.whl (771 kB)
  Using cached nvflare-2.0.4-py3-none-any.whl (767 kB)
  Using cached nvflare-2.0.3-py3-none-any.whl (762 kB)
  Using cached nvflare-2.0.2-py3-none-any.whl (753 kB)
  Using cached nvflare-2.0.1-py3-none-any.whl (418 kB)
  Using cached nvflare-2.0.0-py3-none-any.whl (418 kB)
  Using cached nvflare-1.0.2-py3-none-any.whl (510 kB)
  Downloading nvflare-1.0.1-py3-none-any.whl (510 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 481.3/510.5 KB 551.6 kB/s eta 0:00:01

And then when I try to install a specific version:

pip install nvflare==2.0.15
Collecting nvflare==2.0.15
  Using cached nvflare-2.0.15-py3-none-any.whl (797 kB)
ERROR: Could not find a version that satisfies the requirement tenseal==0.3.0 (from nvflare) (from versions: none)
ERROR: No matching distribution found for tenseal==0.3.0

So, I haven't tried the demo yet and don't know how it actually works. The code looks good to me.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of using nvflare here vs. the previous example?

Just trying to wrap my head around what each component is doing.


def start_controller(self, fl_ctx: FLContext):
self._server = multiprocessing.Process(
target=xgboost.federated.run_federated_server,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a grpc server at the python layer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we could write the grpc server in python, but it might have some limitations when it comes to threading. We are still talking with the nvflare team to figure out the details, so this could change in the future.

xgb.rabit.init([e.encode() for e in rabit_env])

# Load file, file will not be sharded in federated mode.
dtrain = xgb.DMatrix('agaricus.txt.train')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not using the split data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prepare_data.sh we copy each split into the site-specific directory. This probably looks more like the real federated environment.

@rongou
Copy link
Contributor Author

rongou commented May 11, 2022

@trivialfis nvflare seems to have some issue with python 3.9/3.10. I had to specify 3.8 for it to work. Added a note to the readme.

Copy link
Contributor Author

@rongou rongou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RAMitchell for now this is a pretty "shallow" integration, but nvflare can still provide support for managing the federated environment. We are working with the nvflare team to figure out how to get tighter integration for better privacy etc.


def start_controller(self, fl_ctx: FLContext):
self._server = multiprocessing.Process(
target=xgboost.federated.run_federated_server,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we could write the grpc server in python, but it might have some limitations when it comes to threading. We are still talking with the nvflare team to figure out the details, so this could change in the future.

xgb.rabit.init([e.encode() for e in rabit_env])

# Load file, file will not be sharded in federated mode.
dtrain = xgb.DMatrix('agaricus.txt.train')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prepare_data.sh we copy each split into the site-specific directory. This probably looks more like the real federated environment.

demo/nvflare/README.md Outdated Show resolved Hide resolved
f'federated_client_key={self._client_key_path}',
f'federated_client_cert={self._client_cert_path}'
]
xgb.rabit.init([e.encode() for e in rabit_env])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the RabitContext from dask module to rabit module and reuse it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But this changes the class name in dask. Is that what we want? Maybe we can keep the same name RabitContext in the dask module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I made the change to your PR.

@trivialfis trivialfis added this to 2.0 In Progress in 2.0 Roadmap via automation May 12, 2022
@trivialfis trivialfis merged commit af907e2 into dmlc:master May 14, 2022
2.0 Roadmap automation moved this from 2.0 In Progress to 2.0 Done May 14, 2022
@trivialfis trivialfis removed this from 2.0 Done in 2.0 Roadmap Sep 28, 2022
@trivialfis trivialfis added this to In progress in 1.7 Roadmap via automation Sep 28, 2022
@trivialfis trivialfis moved this from In progress to Done in 1.7 Roadmap Sep 28, 2022
@rongou rongou deleted the nvflare-demo branch November 18, 2022 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants