Data Science

Security for Data Privacy in Federated Learning with CUDA-Accelerated Homomorphic Encryption in XGBoost

Dec 18, 2024

By Ziyue Xu, Zhihong Zhang, Yuan-Ting Hsieh and Yan Cheng

Discuss (0)

AI-Generated Summary

Dislike

NVIDIA has developed Federated XGBoost, an XGBoost plugin for federation learning that enables multisite collaborative training across decentralized data sources.
NVIDIA FLARE 2.5.2 and XGBoost federated-secure enhance the security of Federated XGBoost by introducing homomorphic encryption (HE) features to protect data privacy in both vertical and horizontal federated learning settings.
The CUDA-accelerated Homomorphic Encryption with Federated XGBoost provides up to 30x speedups for vertical XGBoost compared to third-party solutions, making it a more efficient and secure solution for applications like fraud detection model training.

AI-generated content may summarize information incompletely. Verify important information. Learn more

XGBoost is a machine learning algorithm widely used for tabular data modeling. To expand the XGBoost model from single-site learning to multisite collaborative training, NVIDIA has developed Federated XGBoost, an XGBoost plugin for federation learning. It covers vertical collaboration settings to jointly train XGBoost models across decentralized data sources, as well as horizontal histogram-based and tree-based federated learning.

Under the vertical setting, each party holds part of the features for the entire population, and only one party holds the label. The label-owner is referred to as the active party, while all other parties are passive parties. Under the horizontal setting, each party holds all features and label information, but only for part of the whole population.

Further, NVIDIA FLARE, a domain-agnostic, open-source, and extensible SDK for federated learning, has enhanced the real-world federated learning experience by introducing capabilities to handle communication challenges. This includes multiple concurrent training jobs, and potential job disruptions due to network conditions.

Currently, Federated XGBoost is built with the assumption of full mutual trust, indicating that no party has the intention to learn more information beyond model training. In practice, though, honest-but-curious is a more realistic setting for federated collaborations. For instance, in vertical federated XGBoost, passive parties may be interested in recovering the label information from the gradients sent by the active party. In horizontal federated learning, the server or other clients can access each client’s gradient histograms and learn their data characteristics.

NVIDIA Flare 2.5.2 and XGBoost federated-secure expand the scope of Federated XGBoost by securing these potential information concerns. Specifically:

The secure federated algorithms, both horizontal and vertical, are implemented and added to the federated schemes supported by XGBoost library, addressing data security patterns under different assumptions.
Homomorphic encryption (HE) features are added to the secure federated XGBoost pipelines using a plugin and processor interface system designed to robustly and effectively bridge the computation by XGBoost, and communication by NVIDIA Flare with proper encryption and decryption in between.
HE plugins are developed, both CPU-based and CUDA-accelerated, providing versatile adaptation depending on hardware and efficiency requirements. The CUDA plugin is shown to be much faster than current third-party solutions.

With the help of HE, key federated computation steps are performed over ciphertexts, and the relevant assets (gradients and partial histograms) are encrypted and will not be learned by other parties during computation. This gives users assurance of their data security, which is one of the fundamental benefits of federated learning.

As explained in this post, CUDA-accelerated Homomorphic Encryption with Federated XGBoost adds security protection for data privacy and delivers up to 30x speedups for vertical XGBoost compared to third-party solutions.

Collaboration modes and secure patterns

For vertical XGBoost, the active party holds the label, which can be considered “the most valuable asset” for the whole process, and should not be accessed by passive parties. Therefore, the active party in this case is the “major contributor” from a model training perspective, with a concern of leaking this information to passive clients. In this case, the security protection is mainly against passive clients over the label information.

To protect label information for vertical collaboration, at every round of XGBoost after the active party computes the gradients for each sample, the gradients will be encrypted before sending to passive parties (Figure 1). Upon receiving the encrypted gradients (ciphertext), they will be accumulated according to the specific feature distribution at each passive party. The resulting cumulative histograms will be returned to the active party, decrypted, and further used for tree building by the active party.

For horizontal XGBoost, each party holds “equal status” (whole feature and label for partial population), while the federated server performs aggregation, without owning any data. Hence in this case, clients have a concern of leaking information to the server, and to each other. Hence, the information to be protected is each clients’ local histograms.

To protect the local histograms for horizontal collaboration, the histograms will be encrypted before sending to the federated server for aggregation. The aggregation will then be performed over ciphertexts and the encrypted global histograms will be returned to clients, where they will be decrypted and used for tree building. In this way, the server will have no access to the plaintext histograms, while each client will only learn the global histogram after aggregation, rather than individual local histograms.

Encryption with proper HE schemes

With multiple libraries covering various HE schemes both with and without GPU support, it is important to properly choose the most efficient scheme for the specific needs of a particular federated XGBoost setting. Let’s look at one example, assume N=5 number of participants, M=200K total number of data samples, J=30 total number of features, and each feature histogram has K=256 slots. Depending on the type of federated learning applications: (Vertical or Horizontal application, we will need different algorithms.

For vertical application, the encryption target is the individual g/h numbers, and the computation is to add the encrypted numbers according to which histogram slots they fall into. As the number of g/h is the same as the sample number, for each boosting round in theory:

The total encryption needed will be M * 2 = 400k (g and h), and each time encrypts a single number
The total encrypted addition needed will be (M – K) * 2 * J ≈ 12m

In this case, an optimal scheme choice would be Paillier because the encryption needs to be performed over a single number. Using schemes targeting vectors like CKKS would be a significant waste of space.

For horizontal application, on the other hand, the encryption target is the local histograms G/H, and the computation is to add local histograms together to form the global histogram. For each boosting round:

The total encryption needed will be N * 2 = 10 (G and H), and each time encrypts a vector of length J * K = 7680
The total encrypted addition needed will be (N – 1) * 2 = 18

In this case, an optimal scheme choice would be CKKS because it is able to handle a histogram vector (with length 7680, for example) in one shot.

We provide encryption solutions both with CPU-only, and with efficient GPU acceleration.

Example results

With implementation of the pipeline previously described on both XGBoost and NVIDIA Flare, we tested our secure federated pipelines with a credit card fraud detection dataset. The results are as follows:

The AUC of vertical learning (both secure and non-secure):

[0] eval-auc:0.90515 train-auc:0.92747 [1] eval-auc:0.90516 train-auc:0.92748 [2] eval-auc:0.90518 train-auc:0.92749

The AUC of horizontal learning (both secure and non-secure):

[0] eval-auc:0.89789 train-auc:0.92732 [1] eval-auc:0.89791 train-auc:0.92733 [2] eval-auc:0.89791 train-auc:0.92733

Comparing the tree models with a centralized baseline, we reached the following observations:

Vertical federated learning (non-secure) has exactly the same tree model as the centralized baseline.

Vertical federated learning (secure) has the same tree structures as the centralized baseline. Furthermore, it produces different tree records at different parties because each party holds different feature subsets, and it should not learn the cut information for features owned by others.

Horizontal federated learning (both secure and non-secure) have different tree models from the centralized baseline. This is due to the initial feature quantile computation, over either global data (centralized) or local data (horizontal).

For more details, refer to the NVIDIA Flare Secure XGBoost example.

Efficiency of encryption methods

To benchmark our solutions, we conducted experiments using a diverse range of datasets with varying characteristics, including differences in size (from small to large) and feature dimensions (from few to many). These benchmarks aim to demonstrate the robustness of our algorithms and highlight significant performance improvements in terms of speed and efficiency.

Dataset and data splits

We used three datasets, covering different data sizes and feature sizes, to illustrate their impact on the efficiency of encryption methods. The data characteristics are summarized in Table 1. The credit card fraud detection dataset is labeled as CreditCard, the Epsilon dataset as Epsilon, and a subset of the HIGGS dataset as HIGGS.

	CreditCard	HIGGS	Epsilon
Data records size	284,807	6,200,000	400,000
Feature size	28	28	2000
Training set size	227,845	4,000,000	320,000
Validation set size	56,962	2,200,000	80,000

Table 1. Summary of the three dataset sizes for experiments, differing in the scale of both the data and the feature

For vertical federated learning, we split the training dataset into two clients, with each client holding different features of the same data records (Table 2).

Feature	CreditCard	HIGGS	Epsilon
Label client	10	10	799
Non-label client	18	18	1201

Table 2. Summary of data for vertical federated learning

For horizontal federated learning, we split the training set into three clients evenly (Table 3).

Data records	CreditCard	HIGGS	Epsilon
Client 1	75,948	1,333,333	106,666
Client 2	75,948	1,333,333	106,666
Client 3	75,949	1,333,334	106,668

Table 3. Summary of data for horizontal federated learning

Experiment results

End-to-end XGBoost training was performed with the following parameters: num_trees = 10, max_depth = 5, max_bin = 256. Testing was performed using the NVIDIA Tesla V100 GPU and the Intel E5-2698 v4 CPU. Figures 3 and 4 show the time comparisons. Note that the simulation was run on the same machine, so federated communication cost is negligible.

Secure vertical Federated XGBoost

We compare the time cost of the NVIDIA Flare pipeline CUDA-accelerated Paillier plugin (noted as GPU plugin) with the existing third-party open-source solution for secure vertical federated XGBoost. Both are HE-encrypted. Figure 3 shows that our solution is 4.6x to 36x faster depending on the combination of data and feature sizes. Note that the third-party solution only supports CPU.

Secure horizontal Federated XGBoost

For secure horizontal Federated XGBoost, third-party offerings do not have a secure solution with HE. Therefore, we compare the time cost of the NVIDIA Flare pipeline without encryption and with the encryption plugin of CKKS using CPU (noted as the CPU plugin) to get an idea of the overhead of the encryption for data protection.

As shown in Figure 4, in this case the computation is notably faster than in the vertical scenario (orders of magnitude lower), and thus GPU acceleration may not be required with such reasonable overhead. Only for datasets with very wide histograms (Epsilon, for example), the encryption overhead will be more significant (but still only ~5% of the vertical setting).

Summary

In this post, we demonstrated how GPU-accelerated Homomorphic Encryption enhances the security of Federated XGBoost, enabling privacy-preserving horizontal and vertical federated learning through NVIDIA FLARE. As compared with existing works of federated XGBoost, the new functionality provides 1) a secure federated XGBoost pipeline ensuring data safety from the algorithm level, and 2) an efficient CUDA-accelerated solution that is much faster than current alternatives on the market enabled by GPU computation. This will encourage adaptations in the fields that have high requirements over both data security and learning efficiency where XGBoost is commonly used, such as fraud detection model training in the financial industry.

For more information and an end-to-end example, visit NVIDIA/NVFlare on GitHub. Reach out to us at federatedlearning@nvidia.com with questions or comments.

Discuss (0)

About the Authors

About Ziyue Xu
Ziyue Xu is a senior scientist at NVIDIA. His research interests lie in the area of medical image analysis, computer vision, and federated learning. He has been working on collaborative AI development over the years along with fellow researchers and clinicians. Ziyue received his B.S. from Tsinghua University in 2006, and M.S. and Ph.D. from the University of Iowa in 2009 and 2012, respectively. He is an IEEE Senior Member, Area Chair for major conferences, and Associate Editor for several journals, including IEEE Transactions of Medical Imaging (TMI) and International Journal of Computer Vision (IJCV).

View all posts by Ziyue Xu

About Zhihong Zhang
Zhihong Zhang is a software engineer working on federated learning frameworks at NVIDIA. Among other things, he developed F3 (Flare Federation Framework) and FOBS (Flare Object Serialization). He has extensive experiences in distributed systems and big data processing. Before joining NVIDIA, he worked at PIXIA and AOL on connectivity and satellite image processing.

View all posts by Zhihong Zhang

About Yuan-Ting Hsieh
Yuan-Ting is a senior software engineer at NVIDIA. He is working on NVIDIA Flare, an application runtime environment designed for NVIDIA federated learning initiatives. Before his work on NVIDIA Flare, Yuan-Ting was an integral part of the team that developed the Clara-Train SDK and AIAA (Artificial Intelligence Assisted Annotation), which has since been integrated into MONAI (Medical Open Network for AI). Yuan-Ting holds a master’s degree in Computer Science from the University of Wisconsin-Madison and a bachelor’s degree in Electrical Engineering from National Taiwan University. He's particularly interested in the intersection of machine learning and distributed systems in his professional pursuits.

View all posts by Yuan-Ting Hsieh

About Yan Cheng
Yan Cheng is the lead of the engineering team that works closely with the DLMED researchers to architect and implement the Clara Train SDK. He has many years of experience building industry-quality software systems. Before joining NVIDIA, he served as a chief architect for AOL, and did IT consulting for the federal government.

View all posts by Yan Cheng