Advancing AI Sports Analytics Through the Data-Driven SKY ENGINE AI Platform and NVIDIA RTX

Building training and testing playgrounds to help advance sport analytics AI solutions out of the lab and into the real world is exceedingly challenging. In team-based sports, building a correct playing strategy before the championship season is a key to success for any professional coach and club owner.

While coaches strive at providing best tips and point out mistakes during the game, they still are incapable of noticing every detail and the behavioral patterns of both teams while rewatching the matches. To collect such data, analyze it, and make inferences about team behavior, you can use sophisticated AI algorithms.

In particular, the types of the tasks that we’d like to solve fostering the analysis of the rugby team are the location of each player during the match and the 3D pose of each player on the field. Having such information in real time provides the necessary evidence for building a better playing strategy.

In many sport analytics cases, a class of a problem has an efficient solution already discovered, but it cannot be efficiently applied – the main bottleneck is missing data.

The process of gathering and labeling data can be expensive and time-consuming. Humans must manually analyze the images, and this labor in such repetitive tasks is not only slow and expensive but also less accurate, compared to computers.

In addition, there are cases that require modern equipment for producing labeled data and highly qualified specialists to maintain the production process. This case significantly increases the project cost or, in many cases, makes the sports analytics project realization inaccessible for stakeholders.

Team-based sports: An attractive opportunity for machine learning and computer vision

What if you could automatically generate the imagery and video data suited perfectly for the task at hand with the complete and always correct ground truth built-in?

In this post, we’d like to show our attempt to achieve exactly this on the example of football or rugby players 3D pose recognition. The goal is to train the AI model to accurately recognize the football players and their poses as human key points in 3D space on the real match footage.

The AI models have been trained exclusively on artificial, synthetic data generated using the SKY ENGINE AI platform and NVIDIA RTX machines. The resulting images are simulated scenes fully controlled by SKY ENGINE’s renderer. All kinds of ground truths can be provided, depending on the model’s requirements.

The SKY ENGINE AI rendering engine with NVIDIA RTX cores provides physically based rendering for deep learning. The heterogeneous system consists of NVIDIA Titan RTX and NVIDIA V100 GPUs. It’s a productive and powerful configuration that can simultaneously generate the labeled, multispectral (if needed) datasets and train the neural networks.

The main advantages of this approach include the following:

Efficient dealing with unbalanced data
Accurate detection of logotypes on uniforms s and stadium (false positives)
Usually noisy, low-quality data stream with compression artifacts does not deteriorate the AI-driven inference accuracy
Unknown parameters of broadcast cameras can be effectively derived
High quality of 3D mapping available
Pose estimation for small objects can be accurately carried out
Complex structures of movements and formations can be accurately recognized
Efficient data processing and computation optimization with NVIDIA RTX architecture

Here’s a look at the complete solution for the 3D pose estimation problem resolved in the SKY ENGINE AI platform.

Sport analytics case with SKY ENGINE AI platform

First, you must configure the rendering engine, define a render data source, and train AI models for human detection and 3D pose estimation.

Assets loading and rendering engine configuration

Start with loading the assets of the stadium’s geometry. The assets are prepared in a standard 3D modeling software and loaded into SKY ENGINE in an Alembic format.

renderer_ctx.load_abc_scene('stadium')
renderer_ctx.setup()

Next, display the loaded geometry of the stadium:

with example_assistant.get_visualizer() as visualizer:
visualizer(renderer_ctx.render_to_numpy())

The next step requires loading textures for the geometries using the Python API:

stadium_base_textures = SubstanceTextureProvider(renderer_ctx, 'concrete')
stadium_base_params = PBRShader.create_parameter_provider(renderer_ctx, tex_scale=50)
renderer_ctx.set_material_definition('stadion_base_GEO',
  MaterialDefinition(stadium_base_textures, parameter_set=stadium_base_params))

As shown earlier, SKY ENGINE provides full support for the procedural textures, which brings the rapid generation of a variety of data as well as physically based rendering (PBR shaders).

Define the environmental map as follows:

renderer_ctx.define_env(Background(renderer_ctx,
EnvMapMiss(renderer_ctx),
HdrTextureProvider(renderer_ctx, 'light_sky')))

At this point, you have the stadium already rendered in the scene. The next step would be to configure the entire scene and populate it with the players. You can do that using a convenient mechanism for instantiating.

The SKY ENGINE renderer provides virtually endless possibilities to shuffle, multiply, randomize, and organize the assets. From a single Alembic animation of a certain player, you are creating two teams of 20 players each.

renderer_ctx.layout().duplicate_subtree(renderer_ctx, 'player_GEO_NUL', suffix='team2')
renderer_ctx.layout().get_node('player_GEO_NUL').n_instances = 20
renderer_ctx.layout().get_node('player_GEO_NUL_team2').n_instances = 20

By default, all the materials are drawn randomly. To create two proper teams, you are ensuring that each player on a given team has the same color of the shirt. Keeping all the other inputs random, such as hair, skin color, socks color, shirt number, and so on.

To achieve this, you must put the players into separate randomization groups and define their drawing strategy. The Substance archive input that controls the shirt’s color is Colors_select. It needs to be the same (synchronized) inside the randomization group and different between the groups. All the other inputs are kept randomized by default.

shirt_sync = SynchronizedInput(SynchronizationDescription(
in_strategy=Synchronization.DISTINCT_EQUAL_GROUPS))
player_material_strategy = DrawingStrategy(renderer_ctx, inputs_strategies={'Colors_select': shirt_sync})
renderer_ctx.instancers['player_GEO'].modify_material_definition(strategy=player_material_strategy)
renderer_ctx.instancers['player_GEO_team2'].modify_material_definition(randomization_group='team2',
strategy=player_material_strategy)

Figure 5 shows that each player is in the same pose. By default, SKY ENGINE plays animations from Alembic files frame by frame, so you must randomize this parameter.

player_geometry_strategy = DrawingStrategy(renderer_ctx, frame_numbers_strategy=UniformRandomInput())
renderer_ctx.instancers['player_GEO'].modify_geometry_definition(strategy=player_geometry_strategy)

During the rugby game, players are not distributed uniformly; they tend to gather in a group, closer together. To make the scene look more natural, you can change the way the players’ positions are drawn. Instead of drawing them uniformly, you can use random Gaussian distribution. It is double-random, because first \(\mu\) and \(\sigma\) are drawn, and then the positions for players are drawn also randomly with these parameters.

gauss_strategy = DrawingStrategy(renderer_ctx,
default_input_strategy=RandomGaussianRandomInput(sigma_relative_limits=(0.1, 0.2)))
renderer_ctx.layout().get_node('player_GEO_NUL').modify_locus_definition(strategy=gauss_strategy)

In this post, we skipped the additional configuration of camera, lights, and postprocessing, but we encourage you to get the details from the GitHub repo. Move to the configuration related to the scene semantic and ground truth.

The key points are already present in the animation of the player. By default, SKY ENGINE calculates all the information about key points, if it receives them in the input assets. You just have to visualize them to be sure that everything is configured correctly. Green key points are visible and red are hidden.

example_assistant.visualized_outputs = {SceneOutput.BEAUTY, SceneOutput.SEMANTIC, SceneOutput.KEYPOINTS}

The scene looks correct, so you can create a renderer data source for AI training.

datasource = MultiPurposeRendererDataSource(renderer_context=renderer_ctx, images_number=20, cache_folder_name='rugby_presentation_new')

AI model training process

For the training phase, you use models and trainers implemented in the DeepSky library, which is part of the SKY ENGINE AI platform.

main_datasource = SEWrapperForDistancePose3D(datasource, imgs_transform=transform)
train_data_loader = DataLoader(dataset,
batch_size=Constants.TRAIN_BATCH_SIZE,
num_workers=Constants.NUM_WORKERS,
drop_last=Constants.DROP_LAST,
shuffle=Constants.VALID_SHUFFLE,
collate_fn=collate_fn)
model = get_pose_3d_model(main_datasource.joint_num, backbone_pretrained=True)
trainer = DefaultTrainer(
data_loader=train_data_loader, model=model, epochs=Constants.EPOCHS, save_freq=1,
valid_data_loader=valid_data_loader, optimizer=optimizer, evaluator=evaluator, scheduler=scheduler, serializer=serializer)
trainer.train()

Now check the results achieved by training an AI model on synthetic data to validate that everything was configured correctly. After each epoch, save a checkpoint and produce some inference examples to see the training progress.

show_jupyter_picture('gtc03_assets/trained/img2.png')

Results of the AI model on the real images

In the next step, you validate the results on a real video. First, use a pretrained model for a player detection to find bounding boxes. For more information, see the player detection tutorial presented at GTC 2019 and available on the SKY ENGINE AI GitHub repo.

checkpoint = torch.load('gtc03_assets/trained/rugby_detection.pth.tar')
for k, v in sorted(checkpoint.items()):
  checkpoint[''.join(['_model.', k])] = checkpoint.pop(k)
detection_model.load_state_dict(checkpoint)
detection_model = detection_model.to(device)
real_dataset = ImageInferenceDatasource(dir='gtc03_assets/real_data', extension='png')
out = outputs.pop()
bboxes = out['boxes'].cpu().detach().numpy()
bboxes = bboxes[np.where(labels == 1)[0]]
labels = out['labels'].cpu().detach().numpy()
bbox_image = bboxes_viz(orig_img, bboxes)

model.eval()
with torch.no_grad():
  results = model((img,), ({'boxes': torch.from_numpy(bboxes).int()},))
results = results.pop()
output_coords, output_bboxes = results['pred_poses_coords'].cpu(), \
        results['boxes'].cpu()

Here are a few examples. Figure 10 shows that the SKY ENGINE AI was able to train one of its key point AI models to detect players and correctly estimate the 3D coordinates of the skeleton joints. The extremely low quality of the available data was a consequence of capturing a live TV broadcast of inferior resolution and strong compression. Such an estimation task, without using a synthetic data approach with perfect ground truths, would be almost impossible in a conventional approach using real footage for AI model training.

Conclusion

3D pose estimation is one of the most complicated computer vision tasks, and usually requires high-quality images, calibrated cameras, and perfect lighting conditions. On the other side, training of the pose estimation algorithms for sport analytics requires costly motion capture sessions with sophisticated equipment mounted on the pitch.

We have just presented how the problem has been solved with simple 3D assets and the SKY ENGINE AI platform working on top of NVIDIA hardware.

The SKY ENGINE AI tools serve to build the applications for team-based sports that will likely revolutionize these games. The players, coaches, clubs, decision-makers, fans, and broadcasters can potentially benefit from further democratization of these sports. For example, you could use SKY ENGINE AI to rapidly assess the skills of players from under-represented regions or lower leagues, without arbitrary judgments from individual scouts.

This approach can be easily replicated to train models to detect humans, estimate their position, and analyze their movements in any conditions regardless of the environment: factory, workshop, or space station.

About SKY ENGINE AI

SKY ENGINE AI is a simulation and deep learning platform that generates fully annotated, synthetic data and trains AI computer vision algorithms at scale. The platform generates highly balanced imagery data of photorealistic environments and objects and provides advanced domain adaptation algorithms. SKY ENGINE AI platform is a tool for developers, data scientists, and ML/software engineers creating computer vision projects in any industry.

The SKY ENGINE AI platform enables building optimal, customized AI models from scratch and training them in VR. SKY ENGINE AI software enables you to create a digital twin of any sensor, drone, or robot and putting them through testing and training in a virtual environment before real-world deployment.

SKY ENGINE AI Data Generation makes life easier for data scientists by providing perfectly balanced synthetic datasets for any computer vision applications. Examples include object detection and recognition, 3D positioning, and pose estimation. Other sophisticated cases include the analysis of multisensor data with radar, lidar, satellite, X-ray, and more.

For more information, see the SKY ENGINE AI platform or the SKY ENGINE AI GitHub repo.