Optical Motion Capture
Yiannis Aloimonos and Gutemberg Guerra-Filho
Computer
Vision Laboratory
Center for
Automation Research
Institute
for Advanced Computer Studies
University
of Maryland
College Park,
Maryland 20742-3275, USA
Motion
capture is the process of recording real life movement of a subject as
sequences of Cartesian coordinates in 3D space. Optical motion capture (OMC)
uses cameras to reconstruct the body posture of the performer. One approach
employs a set of multiple synchronized cameras to capture markers placed in
strategic locations on the body. A motion capture system has applications in
computer graphics for character animation, in virtual reality for human
control-interface, and in video games for realistic simulation of human motion.
In this tutorial, we discuss the theoretical and empirical aspects of an
optical motion capture system. Basically, for a motion capture system
implementation; the resources required consist of a number of synchronized
cameras, an image acquisition system, a capturing area, and a special suit with
markers. The locations of the markers on the suit are designed such that the
required body parts (e.g. joints) are covered. We present our motion capture system
using a framework that identifies different sub-problems to be solved in a
modular way. The sub-problems involved in OMC are initialization, marker
detection, spatial correspondence, temporal correspondence, and
post-processing. In this tutorial, we discuss the theory involved in each
sub-problem and the corresponding novel techniques used in the current
implementation. The initialization includes setting up a human model and the
computation of intrinsic and extrinsic camera calibration. Marker detection
involves finding the 2D pixel coordinates of markers in the images. The spatial
correspondence problem consists in finding pairs of detected markers in
different images captured at the same time with different viewpoints such that
each pair corresponds to the projections of the same scene point. Given camera
calibration and the spatial matching, the 3D reconstruction of markers
(translational data) is achieved by triangulating the various camera views. The
temporal correspondence problem (tracking) involves matching two clouds of 3D
points representing detected markers at two consecutive frames, respectively.
The temporal correspondence module builds a track for each marker where the
marker’s 3D coordinates are concatenated according to time. Post-processing
consists in labeling each track with a marker code, finding missing markers
lost by occlusions, correcting possible gross errors, and filtering noise. Once
the translational data is processed, a hierarchical human model may be used to
compute rotational data (joint angles). We consider standard data formats
available for motion capture data (e.g. bvh, acclaim). Other important
techniques used to improve consistency in the motion data are volumetric
reconstruction, inverse kinematics, and inverse dynamics. We also cover topics
related to editing and manipulation of motion data.
The
Language of Human Movement
· Introduction
o Realistic Movement: Synthesis and
Analysis
o Motion Capture Technologies
o Applications
· Required Resources
o Capture Room
o Body Suit
o Camera Equipment
o Acquisition System
· Initialization
o Markers’ Configuration
o Camera Calibration
o World Coordinate System Alignment
o Background Subtraction
o Kinematic Human Body Model
· Marker/Feature Detection
o Edges
o Corners
o SIFT Features
· Spatial Correspondence
o Stereo Matching
o Wide Baseline
o Dense Correspondence
o Triangulation
· Temporal Correspondence
o Tracking with Appearance
o 2D and 3D Tracking
· Post-Processing
o Labeling
o Missing Markers
o Rigidity Test
o Motion Data Filtering
o Translational and Rotational Data
o Data File Formats
· Advanced Topics
o Visual Hull Reconstruction
o Monocular Markerless MoCap
One approach employs a set of multiple synchronized cameras to capture markers placed in strategic locations on the body. The original videos for the human activities jump and tiptoe are presented in videos 1a and 1b, respectively.
|
Video 1a: Original jump action. |
Video 1b: Original tiptoe action. |
Marker detection involves finding the 2D pixel coordinates
of markers in the images. In our system, the subject wears a black suit with
white markers in a squared shape. Red circles represent the markers detected by
our system in videos 2a and 2b.
|
Video 2a: Markers detected in jump action. |
Video 2b: Markers detected in tiptoe action. |
The spatial correspondence problem consists in finding pairs of detected markers in different images captured at the same time with different viewpoints such that each pair corresponds to the projections of the same scene point. The pairs of markers computed by our system are displayed in videos 3a and 3b. The matches are represented by disparity vectors for markers in consecutive cameras.
|
Video 3a: Disparity vectors in jump action. |
Video 3b: Disparity vectors in tiptoe action. |
Given camera calibration and the spatial matching, the 3D reconstruction of markers (translational data) is achieved by triangulating the various camera views. The reconstructed points are shown in videos 4a and 4b, where the points are virtually inserted in the original background. In videos 5a and 5b, the reconstructed points are projected into different viewpoints.
|
Video 4a: 3D points in the original background (jump action). |
Video 4b: 3D points in the original background (tiptoe action). |
|
|
|
|
Video 5a: 3D points from different viewpoints (jump action). |
Video 5b: 3D points from different viewpoints (tiptoe action). |
The temporal correspondence problem (tracking) involves matching two clouds of 3D points representing detected markers at two consecutive frames, respectively. Given the correspondence between consecutive frames, a time series of 3D coordinates is built. Videos 6a and 6b draw the trajectories of some markers.
|
Video 6a: Trajectories of markers in jump action. |
Video 6b: Trajectories of markers in tiptoe action. |
Post-processing consists in labeling each track with a marker code, filling track gaps caused by occlusions, correcting possible gross errors, filtering or smoothing noise, and interpolating data along time. The final result of our Optical Motion Capture System is shown in videos 7a and 7b, where a humanoid model, called "flat head", performs the actions.
|
Video 7a: Flat head performs a jump action. |
Video 7b: Flat head performs a tiptoe action. |