Projects

 

Human Activity Language

In the sensory-motor intelligence domain, we propose a linguistic framework for the modeling and learning of human activity representations. This linguistic framework is in essence a novel learning approach with applications to several problems. We empirically discovered that the human action space has the structure of a language, with its own analogs of phonology, morphology, and syntax. Our Human Activity Language (HAL) represents the sequential and parallel aspects of human movement with perceptual and generational properties. Kinetology, the phonology of human movement, involves the segmentation problem, the symbolization problem, and an evaluation system according to new principles. We introduce five principles on which kinetology should be based and evaluated: compactness, view-invariance, reproducibility, selectivity, and reconstructivity. The morphology of a specific human activity consists of the set of parallel actuators involved in the activity, the synchronization rules among these actuators, and the motion associated with each participating actuator. We propose a formal model, Parallel Synchronous Grammar System (PSGS), as an action morpheme, where each component grammar corresponds to an actuator. We present a novel heuristic parallel learning algorithm (PAL) to perform grammatical inference of such grammar systems. Nuclear syntax consists of a noun phrase and a verbal phrase. A noun phrase in a HAL sentence corresponds to the active joints (noun) modified by an initial posture (adjective). A verbal phrase includes the changes each active joint experiences during the activity execution (verb) and a point in a reduced dimensional space (adverb) which serves to modify the activity in order to obtain generalization. Besides the nuclear syntax, sequential and parallel syntax are required to handle the sequencing and concurrency of actions, respectively.

 

 

Optical Motion Capture

Motion capture is the process of recording real life movement of a subject as sequences of Cartesian coordinates in 3D space. Optical motion capture (OMC) uses cameras to reconstruct the body posture of the performer. One approach employs a set of multiple synchronized cameras to capture markers placed in strategic locations on the body. The sub-problems involved in OMC are initialization, marker detection, spatial correspondence, temporal correspondence, and post-processing. The initialization includes setting up a human model and the computation of intrinsic and extrinsic camera calibration. Marker detection involves finding the 2D pixel coordinates of markers in the images. The spatial correspondence problem consists in finding pairs of detected markers in different images captured at the same time with different viewpoints such that each pair corresponds to the projections of the same scene point. Given camera calibration and the spatial matching, the 3D reconstruction of markers (translational data) is achieved by triangulating the various camera views. The temporal correspondence problem (tracking) involves matching two clouds of 3D points representing detected markers at two consecutive frames, respectively. The temporal correspondence module builds a track for each marker where the marker’s 3D coordinates are concatenated according to time. Post-processing consists in labeling each track with a marker code, finding missing markers lost by occlusions, correcting possible gross errors, and filtering noise. Once the translational data is processed, a hierarchical human model may be used to compute rotational data (joint angles).

 

 

Motion Data Compression

An immediate application for a compact representation is compression of motion data. With this goal, we implemented and evaluated our kinetological system. The compression efficiency of our algorithm was tested on several different actions. The file size ranges from 128Kb to 1024Kb with increments of 128Kb. The median compression rate for all motion files is 3.698%. The compression encoding is not useful without a decoding process. This way, we implemented a reconstruction process and computed the average error of all frames and all joint angles for each action in our test set. The median average error is 0.823 degree. We implemented two other online segmentation methods for comparison purposes. The first method is an online version of the piecewise linear curve simplification. In this algorithm, each segment grows incrementally until the average error is higher than some threshold. The second method implemented for comparison is the uniform sampling. In this algorithm, equally spaced frames are selected to represent the motion. The compression size and average error curve is computed by varying the space between representative frames.

 

 

Hierarchical Path Finding

We present a new optimal hierarchical approach for shortest path finding. We propose iterative algorithms to find a shortest path and to expand this path into the lowest level. The theoretical complexity of our algorithms is linear in both time and space. This is the first iterative (non-recursive) approach which achieves the optimal lower bound for this problem. Furthermore, the algorithms in our hierarchical approach are simple and present a good potential for parallelization. We also introduce a new algorithm to perform intra-regional shortest path queries in the lowest level of a network hierarchy. Our strategy uses a pre-computed subsequence of vertices that belong to the shortest path (hybrid path view) while actually computing the whole shortest path. The hybrid algorithm requires O(m) time and space assuming a uniform distribution of vertices, where m is the total number of vertices in the region. This represents an improvement concerning space, since a path view approach requires O(m1.5) space in the lowest level. Our Vereda system is an application that uses the TIGER/Line files to compute shortest paths in road networks. The TIGER/Line files are selected geographic and cartographic information from the Census TIGER (Topologically Integrated Geographic Encoding and Referencing) database for all states and territories in the United States of America. The topological structure of the Census TIGER database defines the location and relationship among geographic features. Four pre-processing modules compose the Vereda system: network, border, view, and hierarchy. They pre-process the TIGER/Line files in order to find the information required by the shortest path algorithms. A road network for each county in a state is obtained in the network module. After that, the border module finds for each county the set of vertices that are shared with another county. Then, the view module computes the path view for each county. Finally, the hierarchy module finds a higher-level network for the state to be used in a hierarchical shortest path algorithm.

 

 

3D Model Reconstruction from Images

Our Maquete system performs tasks related to three-dimensional model reconstruction from images. The system deals with images in several file formats (e.g. tif, jpg, bmp, gif). The stereo correspondence problem is solved according to a feature-based approach using winner-takes-all strategy. Features are detected by either Canny edge detector or Harris corner detector. Some similarity measures are used to match features, including the sum of squared differences (SSD), the normalized cross correlation, and the proximity distance introduced by Scott and Longuet-Higgins. A global technique to disambiguate false matches is also implemented. The fundamental matrix is computed by either a seven-point or a normalized eight-point algorithm. A robust approach uses the Least Median of Squares (LMedS) to find a threshold and then apply RANSAC to obtain the fundamental matrix (identifying inliers). A dense disparity map is computed taking advantage of the one-dimensional search imposed by the epipolar constraint. Given the intrinsic parameters (camera calibration), the essential matrix is trivially computed and the relative orientation problem is solved by either Singular Value Decomposition (SVD) or a method introduced by Horn that uses only elementary matrix operations. This way, a rigid motion (translation/baseline and rotation/orientation) is recovered and four solutions for the structure (3D point coordinates) are obtained by projective triangulation. At this point, a positive depth constraint is applied to get rid of further bad correspondences. An initial three-dimensional mesh is generated using Delaunay triangulation stored in a quad-edge data structure. A reprojection test is used in the search for a photo-realistic triangulated mesh. Bundle Adjustment is also implemented using optimization techniques like Tabu Search.

 

 

Volumetric Reconstruction

Silhouette images from multiple viewpoints obtained from background subtraction are useful in reconstructing 3D geometry from 2D images. Two approaches for visual hull reconstruction from silhouettes are volumetric and polyhedral. Volumetric reconstruction tessellates the scene space volume by subdividing the space into a regular grid of volume elements (voxels). Each voxel is projected into every image plane. Projections can be pre-computed and stored in a look-up table in order to improve computation performance. The voxel whose projection falls outside of any silhouette is carved away. A voxel is classified as occupied space if it projects into all foreground silhouettes. Polyhedral reconstruction extracts contours from silhouette images. Given camera calibration, each silhouette contour is projected into 3D space as a generalized cone (polyhedral representation) containing the actual 3D object. The 3D intersection of these cones from all viewpoints gives a polyhedral visual hull.

 

 

Panoramic Image (Mosaic) Construction

The field-of-view (FOV) of a camera is the region of observable space that is perceived at any moment. A projective camera has a limited field-of-view. In order to overcome this limitation, several images corresponding to intersecting FOVs are merged into a single mosaic image corresponding to a larger FOV containing all the original views. A mosaic maps all the original images onto a single coordinate system and, consequently, is equivalent to a panoramic image. No matter what the relative positions and orientations of the cameras. Two images of a planar scene are related by a general linear transformation: a homography. A linear transformation also maps two views of any scene (possibly not restricted to a plane) when both views are taken from the same position (no translation) but in different directions (only rotation). The mosaic construction problem addressed here consists in given a number of different images, all taken by a camera that remained at the same place but rotated in different directions, to create a single larger image (mosaic) which contains all the original images, warped appropriately. The no translation constraint avoids occluded parts of the scene on different images. The mosaicing process works with two images at a time. Initially, the extraction of a sparse set of feature points is performed independently on both images. The problem of feature correspondence (stereo matching or stereo disparity) can be stated as finding pairs of features in two (or more) perspective images such that each pair corresponds to the projections of the same scene point. Stereo matching algorithms are based on the similarity of features. The matching process uses the attributes associated with the detected features. Once point correspondences are given, the transformation that maps points in the second image to points in the first may be found. At least four point correspondences are required to solve a linear system with the eight unknowns that define the transformation matrix. The two images are merged by creating a new image with the first image, and the results of applying the obtained affine transformation to the points of the second image. The full mosaic is obtained by performing this process for each pair of images.

 

 

Face Detection

Automatic identity authentication may use a biometric feature as input. If the biometric feature is a face, the authentication process involves face detection and recognition. We consider a face detection problem assuming the image includes a single face in a frontal pose from a specific distance. Given such an image, the problem consists in finding the rectangle R that contains the face in the image. A pre-processing step computes a probability distribution function Pl, where Pl(x, y) represents the probability that pixel (x, y) is at a face location. In order to compute Pl, we need a training set of face images with the ground-truth rectangle where each face is located. In general, the region of pixels with the maximum face location probability is in the central part of the image. A rectangle R’ contained in this region is assumed to have skin color. Initially, our algorithm computes a probability distribution function Ps, where Ps(x, y) represents the probability that pixel (x, y) has a skin color. In order to compute Ps, we build a histogram Hhue for hue and a histogram Hsat for saturation of pixels in the rectangle R’. For each pixel (x, y) in the image, Ps(x, y) is proportional to Hhue(h)*Hsat(s), where h and s represent hue and saturation of the pixel (x, y), respectively. The probability Pf(x, y) that a pixel (x, y) belongs to a face is the product of the skin color probability and location probability: Pf(x, y) = Ps(x, y)*Pl(x, y). Since the size of the face rectangle R is estimated from the frontal pose assumption, we detect a face as the rectangle R with the location that maximizes the sum of probability Pf for all pixels contained in R.

 

 

Traffic Surveillance

Given real-time images from traffic cameras located throughout a metropolitan area, our initial interest is the automatic detection of vehicles traveling on the transportation network. This is a first step towards the identification of vehicles, where each “different” detected vehicle is associated with a unique identification number. Vehicle identification may lead to the tracking of a vehicle in a single camera and among different cameras in the network. A basic approach for the detection of moving objects on images is background subtraction. Initially, an image background is constructed for each camera using the corresponding input video. For each pixel location in the background image, the color of the pixel is the median of the colors in all frames of the video for the respective pixel location. Once an image background is built, this image is subtracted from the original video in order to detect moving objects. The assumption is that a moving object has a different visual appearance from the constructed static background. The input videos contain a fair share of pedestrians, bikers, luminous ads, and other independent distractions. In order to eliminate the influence of such visually dynamic aspects, we define a region of interest for each camera. The background subtraction values are kept only in the region of interest. The other values are assigned to a zero difference. With the background subtraction updated according to the region of interest, motion pixels are identified as the locations where absolute difference in appearance is greater than a threshold value. After that, blobs possibly representing cars are found as the connected components of the motion pixels in the image. Components with a very small area are filtered out.

 

 

Proximity Problems on Polyhedral Surfaces

Shortest Path Planning is the field of Computational Geometry that concerns the determination of feasible shortest paths from a point to another in a given environment. We considered a Directed Frictioned Geodesic (DFG) shortest path problem that minimizes the total work spent to move a body on a polyhedral surface with constant friction coefficient and constant slope in each face. In order to characterize geodesic paths and shortest paths according to the constraints of the problem, we identified the corresponding local optimality criterion and we demonstrated the strict convexity of the DFG distance function using convex function theory. We developed an algorithm, based on the continuous Dijkstra methodology, to solve the DFG shortest path problem. We extended the DFG shortest path problem to construct a shortest path Voronoi diagram on a polyhedral surface according to the DFG distance function. We reduced some generalizations of proximity problems to the construction of this diagram and, therefore, this Voronoi diagram solved these proximity problems. We implemented an external module to the Geomview program to visualize a shortest path tree and a Voronoi diagram on polyhedral surfaces. The implementation of shortest path algorithms on polyhedral surfaces used Geomview as a viewer for dynamically changing geometric objects in Silicon Graphics. The external module allowed selection of a vertex of the polyhedral surface as the source point for a path. The control panel allows the user to choose between a snapshot execution and a step-by-step animation execution of the algorithm.

 

 

Raster to Vector Conversion

Implementation of raster to vector map conversion using the Image Processing Toolbox of Matlab. Initially, the system thresholds the map image concerning a particular set of colors in order to compute a binary image representing roads with no other map features (e.g. text). The binary image is smoothed to avoid noise by performing a two-dimensional order statistic filtering. The Canny edge detector is used to find edge elements (edgels) in the smoothed binary image. The system computes the connected components of the edgels and vectorizes them by computing a line according to the Hough transform. The endpoints of a line segment are found using a proximity criterion and the edgels corresponding to this line segment are removed. This process is repeated until a connected component is empty.

 

 

Land Surveying

This project has a strong personal connotation. After my grandparents passed away, my family needed to distribute the assets among the heirs. The biggest challenge was the partition of a peace of land in the northeast of Brazil. I volunteered to measure the land with a GPS, to estimate its value, and to subdivide the property in a “fair” manner. The measuring task took me two days in tracking the property boundaries. I covered a 6Km distance of Caatinga vegetation just in the first morning of surveying. The estimation task had some issues. Market values are not easily found for properties in that region. In general, land trading is not advertised on newspapers. The value depends on the facilities the property includes, on the access to roads, and mainly on the proximity to water resources. The subdivision task had also some complications. The shape of the land is very irregular and consists of disconnected parts. Due to trading among heirs, each person had right to a different share of the land. Furthermore, the value of each land acre is not the same across the property. Once GPS data was gathered, I developed a system to display the property map with its boundaries, its facilities, and landmarks. In order to estimate value, I used two base prices for an acre of land. The highest base price is for an acre right next to a road and water. The lowest base price is for an acre that is farthest from roads and water. The distance to roads and water resources is used to interpolate these two base prices in computing the value of any acre unit. This way, I used a grid to subdivide the property and the value of each cell in the grid is computed. The grid cells are assigned to heirs in an interactive way. Each time a user clicks on a cell with the mouse, this cell is assigned to the current heir in the system and the share value of all heirs is printed. The current heir is changed through the keyboard. Although the system does not solve the partition problem automatically, the computer aids the user into designing a feasible and fair partition for the land.