After clicking “Watch Now” you will be prompted to login or join.
Democratized M pipelines and Spark Rapids based-hyperparameter tuning
Faraz Waseem , Verizon Media | Abhishek Bhardwaj, Verizon media
GTC 2020
XGBoost is an optimized and distributed gradient-based boosting library that's been applied to solve variety of data science problems with its classification and regression capabilities. It's ranked as one of the best algorithms to win Kaggle competitions. We build high-dimensional churn prediction models using XGBoost on Spark, and challenge its limits on the amount of data and dimensions it can handle within a reasonable time. XGBoost was natural choice because of its accuracy and out-of-box support to handle unbalanced dataset. With GPU-based distributed XGBoost we crossed the limit of the CPU-based XGBoost solution and took it to next level. It enhanced our capabilities to do hyperparameters search for optimized models for maximum return.