Optical Motion Capture

Yiannis Aloimonos and Gutemberg Guerra-Filho

Computer Vision Laboratory

Center for Automation Research

Institute for Advanced Computer Studies

Department of Computer Science

University of Maryland

College Park, Maryland 20742-3275, USA

 

Motion capture is the process of recording real life movement of a subject as sequences of Cartesian coordinates in 3D space. Optical motion capture (OMC) uses cameras to reconstruct the body posture of the performer. One approach employs a set of multiple synchronized cameras to capture markers placed in strategic locations on the body. A motion capture system has applications in computer graphics for character animation, in virtual reality for human control-interface, and in video games for realistic simulation of human motion. In this tutorial, we discuss the theoretical and empirical aspects of an optical motion capture system. Basically, for a motion capture system implementation; the resources required consist of a number of synchronized cameras, an image acquisition system, a capturing area, and a special suit with markers. The locations of the markers on the suit are designed such that the required body parts (e.g. joints) are covered. We present our motion capture system using a framework that identifies different sub-problems to be solved in a modular way. The sub-problems involved in OMC are initialization, marker detection, spatial correspondence, temporal correspondence, and post-processing. In this tutorial, we discuss the theory involved in each sub-problem and the corresponding novel techniques used in the current implementation. The initialization includes setting up a human model and the computation of intrinsic and extrinsic camera calibration. Marker detection involves finding the 2D pixel coordinates of markers in the images. The spatial correspondence problem consists in finding pairs of detected markers in different images captured at the same time with different viewpoints such that each pair corresponds to the projections of the same scene point. Given camera calibration and the spatial matching, the 3D reconstruction of markers (translational data) is achieved by triangulating the various camera views. The temporal correspondence problem (tracking) involves matching two clouds of 3D points representing detected markers at two consecutive frames, respectively. The temporal correspondence module builds a track for each marker where the marker’s 3D coordinates are concatenated according to time. Post-processing consists in labeling each track with a marker code, finding missing markers lost by occlusions, correcting possible gross errors, and filtering noise. Once the translational data is processed, a hierarchical human model may be used to compute rotational data (joint angles). We consider standard data formats available for motion capture data (e.g. bvh, acclaim). Other important techniques used to improve consistency in the motion data are volumetric reconstruction, inverse kinematics, and inverse dynamics. We also cover topics related to editing and manipulation of motion data.

Tutorial Slides

The Language of Human Movement

 

 

·       Introduction

o     Realistic Movement: Synthesis and Analysis

o     Motion Capture Technologies

o     Applications

·       Required Resources

o     Capture Room

o     Body Suit

o     Camera Equipment

o     Acquisition System

·       Initialization

o     Markers’ Configuration

o     Camera Calibration

o     World Coordinate System Alignment

o     Background Subtraction

o     Kinematic Human Body Model

·       Marker/Feature Detection

o     Edges

o     Corners

o     SIFT Features

·       Spatial Correspondence

o     Stereo Matching

o     Wide Baseline

o     Dense Correspondence

o     Triangulation

·       Temporal Correspondence

o     Tracking with Appearance

o     2D and 3D Tracking

·       Post-Processing

o     Labeling

o     Missing Markers

o     Rigidity Test

o     Motion Data Filtering

o     Translational and Rotational Data

o     Data File Formats

·       Advanced Topics

o     Visual Hull Reconstruction

o     Monocular Markerless MoCap

 

One approach employs a set of multiple synchronized cameras to capture markers placed in strategic locations on the body. The original videos for the human activities jump and tiptoe are presented in videos 1a and 1b, respectively.

 

Video 1a: Original jump action.

Video 1b: Original tiptoe action.

Marker detection involves finding the 2D pixel coordinates of markers in the images. In our system, the subject wears a black suit with white markers in a squared shape. Red circles represent the markers detected by our system in videos 2a and 2b.

 

Video 2a: Markers detected in jump action.

Video 2b: Markers detected in tiptoe action.

The spatial correspondence problem consists in finding pairs of detected markers in different images captured at the same time with different viewpoints such that each pair corresponds to the projections of the same scene point. The pairs of markers computed by our system are displayed in videos 3a and 3b. The matches are represented by disparity vectors for markers in consecutive cameras.

 

Video 3a: Disparity vectors in jump action.

Video 3b: Disparity vectors in tiptoe action.

Given camera calibration and the spatial matching, the 3D reconstruction of markers (translational data) is achieved by triangulating the various camera views. The reconstructed points are shown in videos 4a and 4b, where the points are virtually inserted in the original background. In videos 5a and 5b, the reconstructed points are projected into different viewpoints.

 

Video 4a: 3D points in the original background (jump action).

Video 4b: 3D points in the original background (tiptoe action).


 


 

Video 5a: 3D points from different viewpoints (jump action).

Video 5b: 3D points from different viewpoints (tiptoe action).

The temporal correspondence problem (tracking) involves matching two clouds of 3D points representing detected markers at two consecutive frames, respectively. Given the correspondence between consecutive frames, a time series of 3D coordinates is built. Videos 6a and 6b draw the trajectories of some markers.

 

Video 6a: Trajectories of markers in jump action.

Video 6b: Trajectories of markers in tiptoe action.

Post-processing consists in labeling each track with a marker code, filling track gaps caused by occlusions, correcting possible gross errors, filtering or smoothing noise, and interpolating data along time. The final result of our Optical Motion Capture System is shown in videos 7a and 7b, where a humanoid model, called "flat head", performs the actions.

 

Video 7a: Flat head performs a jump action.

Video 7b: Flat head performs a tiptoe action.