Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement secure boost scheme - secure evaluation and validation (during training) without local feature leakage #10079

Open
wants to merge 38 commits into
base: vertical-federated-learning
Choose a base branch
from

Conversation

ZiyueXu77
Copy link

@ZiyueXu77 ZiyueXu77 commented Feb 27, 2024

For implementing Vertical Federated Learning with Secure Features, as discussed in
#9987
This part is independent from the encryption and the alternative vertical pipeline. The purpose is to avoid leaking the real cut value information from participants. Hence add as a separate PR.
This PR is based on #10037, which should be reviewed and merged first.

ZiyueXu77 and others added 29 commits January 31, 2024 10:48
…lobal best split, but need to further apply split correctly
…lobal best split, but need to further apply split correctly
@ZiyueXu77 ZiyueXu77 marked this pull request as draft March 4, 2024 16:07
@ZiyueXu77 ZiyueXu77 marked this pull request as ready for review March 4, 2024 19:00
@ZiyueXu77
Copy link
Author

Hi @trivialfis , the method implementation part for secure inference is ready, I added detailed information to our RFC under the section "Design for Secure Inference - avoid leakage of feature cut value". @YuanTingHsieh will add / make modifications to the unit testing. Thanks!

@ZiyueXu77 ZiyueXu77 changed the title Implement secure boost scheme - secure inference without local feature leakage Implement secure boost scheme - secure evaluation and validation without local feature leakage Mar 5, 2024
@ZiyueXu77 ZiyueXu77 changed the title Implement secure boost scheme - secure evaluation and validation without local feature leakage Implement secure boost scheme - secure evaluation and validation (during training) without local feature leakage Mar 5, 2024
@@ -445,29 +449,27 @@ void SketchContainerImpl<WQSketch>::MakeCuts(Context const *ctx, MetaInfo const
max_cat = std::max(max_cat, AddCategories(categories_.at(fid), p_cuts));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding, categorical features are not yet supported right?

Copy link
Author

@ZiyueXu77 ZiyueXu77 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, will need to find a proper use-case / testing data with categorical features to add the support, it seems the categorical feature is "experimental" according to some of the last year's release notes, is it still the case? maybe we can add the support later when we find it really necessary.

src/common/quantile.cc Outdated Show resolved Hide resolved
src/tree/common_row_partitioner.h Outdated Show resolved Hide resolved
src/tree/common_row_partitioner.h Outdated Show resolved Hide resolved
Comment on lines +307 to 315
if (!is_secure_) {
split_pt = cut_val[i]; // not used for partition based
best.Update(loss_chg, fidx, split_pt, d_step == -1, false, left_sum, right_sum);
} else {
// secure mode: record the best split point, rather than the actual value
// since it is not accessible at this point (active party finding best-split)
best.Update(loss_chg, fidx, i, d_step == -1, false, left_sum, right_sum);
}
} else {
Copy link
Member

@trivialfis trivialfis Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think a policy class might help here? Or maybe there are other efficient ways to handle these conditions? I'm losing track of these conditions, considering that we have three enumeration functions:

  • numeric
  • partition
  • one hot

Then we have three split modes:

  • column
  • row
  • column + secure

So, in combination, 9 potential cases, and we haven't counted vector leaf yet. Need to find a better way to manage these conditions.

Copy link
Author

@ZiyueXu77 ZiyueXu77 Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be tricky to consolidate, since the 9 cases have high overlaps (e.g. same enumeration logic for all splits modes except when secure+passive party), some further processing only for col_split (w/ w/o secure), but irrelevant to enumeration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing regarding this mode combinations: potentially with the upcoming processor interface we will be able to enable encrypted horizontal, shall we further add a row + secure mode, adding a 4th one for
enum class DataSplitMode : int { kRow = 0, kCol = 1, kColSecure = 2 };? (or maybe there are better options?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my preference, I would have put it in the CommunicatorContext configuration for whether the channel is encrypted.

@trivialfis
Copy link
Member

Hi, could you please share how to run some high level tests?

@ZiyueXu77
Copy link
Author

Hi, could you please share how to run some high level tests?

Sure, this is what I am using for testing:
https://github.com/ZiyueXu77/NVFlare/tree/secureboost/examples/advanced/xgboost_secure
will share the data link with you

@ZiyueXu77
Copy link
Author

Another general challenge for any vertical pipelines: at inference time, all parties need to be online, and as our model records the "global feature index", the "order" of the clients need to remain the same. We may need some mechanisms to ensure this order.

@trivialfis
Copy link
Member

the "order" of the clients need to remain the same. We may need some mechanisms to ensure this order.

I will leave that to nvflare.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me overall. We can merge it once we have some basic unittests.

As for integration tests in Python with nvflare (in future PRs), we can assert that

  • models are different for different workers.
  • predictions are the same
  • evaluation result are the same
  • only works if the 0th worker has the label.

I highly recommend using the hypothesis test framework (see python tests in xgboost and search the term hypothesis).

@ZiyueXu77
Copy link
Author

The code looks good to me overall. We can merge it once we have some basic unittests.

As for integration tests in Python with nvflare (in future PRs), we can assert that

  • models are different for different workers.
  • predictions are the same
  • evaluation result are the same
  • only works if the 0th worker has the label.

I highly recommend using the hypothesis test framework (see python tests in xgboost and search the term hypothesis).

Thanks! @YuanTingHsieh , could you add the unit tests according to @trivialfis 's suggestions?

@trivialfis
Copy link
Member

could you add the unit tests according to @trivialfis 's suggestions

Those points are all for integration tests not for small unittest. I think the integration tests in Python with nvflare will take more effort, we don't need to rush it in this PR.

@trivialfis
Copy link
Member

Hi, is there any update?

@ZiyueXu77
Copy link
Author

Hi, is there any update?

Thanks for asking! :) @YuanTingHsieh has been busy with a related NVFlare release in the past two weeks, now the release is close to finish, he will have time to work on this soon.

@ZiyueXu77
Copy link
Author

ZiyueXu77 commented Apr 15, 2024

@trivialfis Yuanting just added some unit tests, seems there is a failed R-test, but not sure if it is related to our modifications, the error message being

* checking package namespace information ... OK
* checking package dependencies ... ERROR
Packages suggested but not available: 'ggplot2', 'DiagrammeR', 'igraph'
.........
Traceback (most recent call last):
Ncpus: 4
  File "/__w/xgboost/xgboost/tests/ci_build/test_r_package.py", line 359, in <module>
    main(args)
  File "/__w/xgboost/xgboost/tests/ci_build/test_utils.py", line 52, in inner
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/__w/xgboost/xgboost/tests/ci_build/test_r_package.py", line 307, in main
    check_rpackage(tarball)
  File "/__w/xgboost/xgboost/tests/ci_build/test_utils.py", line 31, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/__w/xgboost/xgboost/tests/ci_build/test_utils.py", line 52, in inner
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/__w/xgboost/xgboost/tests/ci_build/test_r_package.py", line 166, in check_rpackage
    with open(rcheck_dir / "00install.out", "r") as fd:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'xgboost.Rcheck/00install.out'

@trivialfis
Copy link
Member

That should be unrelated, will look into this PR today.

@ZiyueXu77
Copy link
Author

Hi @trivialfis , thanks for the updates, just merged it.
Everything passed, except an error for R regarding "matrix":
2024-04-29T13:40:51.6394593Z make: *** [Makefile:291: Matrix.ts] Error 1

@trivialfis
Copy link
Member

Triggered the rest of the CI.

@ZiyueXu77
Copy link
Author

Hi @trivialfis , there are 3 failed checks, but I think they align with the rebase merge, shall we just merge this? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants