GTC 2020: Democratized M pipelines and Spark Rapids based-hyperparameter tuning
After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
Democratized M pipelines and Spark Rapids based-hyperparameter tuning
Faraz Waseem , Verizon Media | Abhishek Bhardwaj, Verizon media
XGBoost is an optimized and distributed gradient-based boosting library that's been applied to solve variety of data science problems with its classification and regression capabilities. It's ranked as one of the best algorithms to win Kaggle competitions. We build high-dimensional churn prediction models using XGBoost on Spark, and challenge its limits on the amount of data and dimensions it can handle within a reasonable time. XGBoost was natural choice because of its accuracy and out-of-box support to handle unbalanced dataset. With GPU-based distributed XGBoost we crossed the limit of the CPU-based XGBoost solution and took it to next level. It enhanced our capabilities to do hyperparameters search for optimized models for maximum return.