"Imagine watching a 3D hologram of a live soccer game on your living room table; you can walk around with an Augmented Reality device, watch the players from different viewpoints, and lean in to see the action up close1." Well, now this might be a way to "get rid" of television as we know it.
We present a system that transforms a monocular video of a soccer game into a moving 3D reconstruction, in which the players and field can be rendered interactively with a 3D viewer or through an Augmented Reality device. At the heart of our paper is an approach to estimate the depth map of each player, using a CNN that is trained on 3D player data extracted from soccer video games. We compare with state of the art body pose and depth estimation techniques, and show results on both synthetic ground truth benchmarks, and real YouTube soccer footage.
This is a paper from Konstantinos Rematas2, Ira Kemelmacher-Shlizerman23, Brian Curless2, and Steve Seitz24.