NVIDIA Research: Generating Motion Capture Animation Without Hardware or Motion Data

Researchers from NVIDIA, the University of Toronto, and the Vector Institute have proposed a new motion capture method foregoing the use of expensive motion-capture hardware. It uses only video input to improve past motion-capture animation models.

YouTuber and graphics researcher Dr. Károly Zsolnai-Fehér breaks down the research on this innovative technology in his YouTube series: Two Minute Papers. This video highlights how the researchers can capture individuals using AI solely through video input to translate it into a digital avatar. They can then give the avatar a physics simulation to negate the conventional challenges of foot sliding and temporal inconsistencies or flickering. Check out the video below:

Figure 1: A video presenting the paper “Physics-based Human Motion Estimation and Synthesis from Videos in 2 minutes.”

“In this paper, we introduced a new framework for training motion synthesis models from raw video pose estimations without making use of motion capture data,” Kevin Xie explains in the paper.

“Our framework refines noisy pose estimates by enforcing physics constraints through contact invariant optimization, including computation of contact forces. We then train a time-series generative model on the refined poses, synthesizing both future motion and contact forces. Our results demonstrated significant performance boosts in both, pose estimation via our physics-based refinement, and motion synthesis results from video. We hope that our work will lead to more scalable human motion synthesis by leveraging large online video resources.”

This framework brings people one step closer to working and playing inside virtual worlds. It will help developers animate human motion far more affordably, with a much greater diversity of motions. From video games to the virtual world, this framework will undoubtedly impact how we visualize human motion synthesis.

Check out the framework or read the paper Physics-based Human Motion Estimation and Synthesis from Videos, by Kevin Xie, Tingwu Wang, Umar Iqbal, Yunrong Guo, Sanja Fidler, and Florian Shkurti.

Learn more about the NVIDIA Toronto AI lab.