Computer Vision / Video Analytics

Creating More Engaging Sports Broadcasts With AI

In order to expand the scope of possibilities for augmented reality applications and player performance tracking in professional sports, researchers at Stats Perform, a sports analytics company, have developed an AI-based method that makes camera calibration faster and more flexible.

Sports teams use vision-based tracking systems to analyze player performance, and broadcasters deploy AR to enhance the viewing experience (e.g. the Virtual 3 in basketball, the First Down Line in the NFL). These solutions depend on high-quality camera calibration and as such are usually constrained to pre-calibrated, fixed cameras or real-time parameter updates from pan-tilt-zoom cameras. More flexible solutions exist but don’t work well in busy, non-uniform environments like a basketball game where the field is crowded and changes appearance from arena to arena.

To overcome these limitations, the researchers devised a novel neural network that combines semantic segmentation, camera pose initialization, and homography refinement into a single network architecture.

Trained with an NVIDIA TITAN RTX GPU, with the cuDNN-accelerated TensorFlow deep learning framework, their method enables them to determine camera homography of a single moving camera using only the camera frame and the sport as inputs. 

Their technique also reduced the inference time to 4 milliseconds (a reduction of two orders of magnitude compared to the previous state-of-the-art), making it suitable for live broadcast.

“The evaluation results show that our method outperforms the previous state-of-the-art in challenging scenarios like basketball and achieves competitive performance in relatively static environments like soccer,” the researchers stated in their paper, End-to-End Camera Calibration for Broadcast Videos. The paper will be presented at the Computer Vision and Pattern Recognition (CVPR) conference this week. 

Top row: examples of images in a basketball game with the field projection (blue lines) generated by the predicted homography. Bottom row: The output of the semantic segmentation step of the network. After the segmentation, the network performs camera pose initialization and homography refinement.

According to the researchers a typical sporting event is covered by multiple separate cameras, but today only a handful of them provide the calibration capabilities needed for these applications. Extending these capabilities to more cameras and more sports will improve insights and help sports leagues build a tighter relationship with their fans.

Read more>

Discuss (0)