|


|
Human Activity Language
In the sensory-motor intelligence domain, we propose
a linguistic framework for the modeling and learning of human activity
representations. This linguistic framework is in essence a novel learning
approach with applications to several problems. We empirically discovered
that the human action space has the structure of a language, with its own
analogs of phonology, morphology, and syntax. Our Human Activity Language
(HAL) represents the sequential and parallel aspects of human movement with
perceptual and generational properties. Kinetology, the phonology of human
movement, involves the segmentation problem, the symbolization problem, and
an evaluation system according to new principles. We introduce five
principles on which kinetology should be based and evaluated: compactness,
view-invariance, reproducibility, selectivity, and reconstructivity. The
morphology of a specific human activity consists of the set of parallel
actuators involved in the activity, the synchronization rules among these
actuators, and the motion associated with each participating actuator. We
propose a formal model, Parallel Synchronous Grammar System (PSGS), as an
action morpheme, where each component grammar corresponds to an actuator.
We present a novel heuristic parallel learning algorithm (PAL) to perform
grammatical inference of such grammar systems. Nuclear syntax consists of a
noun phrase and a verbal phrase. A noun phrase in a HAL sentence
corresponds to the active joints (noun) modified by an initial posture
(adjective). A verbal phrase includes the changes each active joint
experiences during the activity execution (verb) and a point in a reduced
dimensional space (adverb) which serves to modify the activity in order to
obtain generalization. Besides the nuclear syntax, sequential and parallel
syntax are required to handle the sequencing and concurrency of actions,
respectively.
|
|


|
Optical Motion Capture
Motion capture is the process of recording real life
movement of a subject as sequences of Cartesian coordinates in 3D space.
Optical motion capture (OMC) uses cameras to reconstruct the body posture
of the performer. One approach employs a set of multiple synchronized
cameras to capture markers placed in strategic locations on the body. The
sub-problems involved in OMC are initialization, marker detection, spatial
correspondence, temporal correspondence, and post-processing. The
initialization includes setting up a human model and the computation of
intrinsic and extrinsic camera calibration. Marker detection involves
finding the 2D pixel coordinates of markers in the images. The spatial correspondence
problem consists in finding pairs of detected markers in different images
captured at the same time with different viewpoints such that each pair
corresponds to the projections of the same scene point. Given camera
calibration and the spatial matching, the 3D reconstruction of markers
(translational data) is achieved by triangulating the various camera views.
The temporal correspondence problem (tracking) involves matching two clouds
of 3D points representing detected markers at two consecutive frames,
respectively. The temporal correspondence module builds a track for each
marker where the marker’s 3D coordinates are concatenated according to
time. Post-processing consists in labeling each track with a marker code,
finding missing markers lost by occlusions, correcting possible gross
errors, and filtering noise. Once the translational data is processed, a
hierarchical human model may be used to compute rotational data (joint
angles).
|
|


|
Motion Data Compression
An immediate application for a compact representation
is compression of motion data. With this goal, we implemented and evaluated
our kinetological system. The compression efficiency of our algorithm was
tested on several different actions. The file size ranges from 128Kb to
1024Kb with increments of 128Kb. The median compression rate for all motion
files is 3.698%. The compression encoding is not useful without a decoding
process. This way, we implemented a reconstruction process and computed the
average error of all frames and all joint angles for each action in our
test set. The median average error is 0.823 degree. We implemented two
other online segmentation methods for comparison purposes. The first method
is an online version of the piecewise linear curve simplification. In this
algorithm, each segment grows incrementally until the average error is
higher than some threshold. The second method implemented for comparison is
the uniform sampling. In this algorithm, equally spaced frames are selected
to represent the motion. The compression size and average error curve is
computed by varying the space between representative frames.
|
|


|
Hierarchical Path Finding
We present a new optimal hierarchical approach for shortest
path finding. We propose iterative algorithms to find a shortest path and
to expand this path into the lowest level. The theoretical complexity of
our algorithms is linear in both time and space. This is the first
iterative (non-recursive) approach which achieves the optimal lower bound
for this problem. Furthermore, the algorithms in our hierarchical approach
are simple and present a good potential for parallelization. We also
introduce a new algorithm to perform intra-regional shortest path queries
in the lowest level of a network hierarchy. Our strategy uses a
pre-computed subsequence of vertices that belong to the shortest path
(hybrid path view) while actually computing the whole shortest path. The
hybrid algorithm requires O(m) time and space assuming a
uniform distribution of vertices, where m is the total number of
vertices in the region. This represents an improvement concerning space,
since a path view approach requires O(m1.5) space
in the lowest level. Our Vereda system is an application that uses the
TIGER/Line files to compute shortest paths in road networks. The TIGER/Line
files are selected geographic and cartographic information from the Census
TIGER (Topologically Integrated Geographic Encoding and Referencing)
database for all states and territories in the United States of America. The
topological structure of the Census TIGER database defines the location and
relationship among geographic features. Four pre-processing modules compose
the Vereda system: network, border, view, and hierarchy. They pre-process
the TIGER/Line files in order to find the information required by the
shortest path algorithms. A road network for each county in a state is
obtained in the network module. After that, the border module finds for
each county the set of vertices that are shared with another county. Then,
the view module computes the path view for each county. Finally, the
hierarchy module finds a higher-level network for the state to be used in a
hierarchical shortest path algorithm.
|
|


|
3D Model Reconstruction from Images
Our Maquete system performs tasks related to
three-dimensional model reconstruction from images. The system deals with
images in several file formats (e.g. tif, jpg, bmp, gif). The stereo
correspondence problem is solved according to a feature-based approach
using winner-takes-all strategy. Features are detected by either Canny edge
detector or Harris corner detector. Some similarity measures are used to
match features, including the sum of squared differences (SSD), the normalized
cross correlation, and the proximity distance introduced by Scott and
Longuet-Higgins. A global technique to disambiguate false matches is also
implemented. The fundamental matrix is computed by either a seven-point or
a normalized eight-point algorithm. A robust approach uses the Least Median
of Squares (LMedS) to find a threshold and then apply RANSAC to obtain the
fundamental matrix (identifying inliers). A dense disparity map is computed
taking advantage of the one-dimensional search imposed by the epipolar
constraint. Given the intrinsic parameters (camera calibration), the
essential matrix is trivially computed and the relative orientation problem
is solved by either Singular Value Decomposition (SVD) or a method
introduced by Horn that uses only elementary matrix operations. This way, a
rigid motion (translation/baseline and rotation/orientation) is recovered
and four solutions for the structure (3D point coordinates) are obtained by
projective triangulation. At this point, a positive depth constraint is
applied to get rid of further bad correspondences. An initial
three-dimensional mesh is generated using Delaunay triangulation stored in
a quad-edge data structure. A reprojection test is used in the search for a
photo-realistic triangulated mesh. Bundle Adjustment is also implemented
using optimization techniques like Tabu Search.
|
|


|
Volumetric Reconstruction
Silhouette images from multiple viewpoints obtained from
background subtraction are useful in reconstructing 3D geometry from 2D
images. Two approaches for visual hull reconstruction from silhouettes are
volumetric and polyhedral. Volumetric reconstruction tessellates the scene
space volume by subdividing the space into a regular grid of volume
elements (voxels). Each voxel is projected into every image plane.
Projections can be pre-computed and stored in a look-up table in order to
improve computation performance. The voxel whose projection falls outside of
any silhouette is carved away. A voxel is classified as occupied space if
it projects into all foreground silhouettes. Polyhedral reconstruction
extracts contours from silhouette images. Given camera calibration, each
silhouette contour is projected into 3D space as a generalized cone
(polyhedral representation) containing the actual 3D object. The 3D
intersection of these cones from all viewpoints gives a polyhedral visual
hull.
|
|



|
Panoramic Image (Mosaic) Construction
The field-of-view (FOV) of a camera is the region of
observable space that is perceived at any moment. A projective camera has a
limited field-of-view. In order to overcome this limitation, several images
corresponding to intersecting FOVs are merged into a single mosaic image
corresponding to a larger FOV containing all the original views. A mosaic
maps all the original images onto a single coordinate system and,
consequently, is equivalent to a panoramic image. No matter what the
relative positions and orientations of the cameras. Two images of a planar
scene are related by a general linear transformation: a homography. A
linear transformation also maps two views of any scene (possibly not
restricted to a plane) when both views are taken from the same position (no
translation) but in different directions (only rotation). The mosaic
construction problem addressed here consists in given a number of different
images, all taken by a camera that remained at the same place but rotated
in different directions, to create a single larger image (mosaic) which
contains all the original images, warped appropriately. The no translation
constraint avoids occluded parts of the scene on different images. The
mosaicing process works with two images at a time. Initially, the extraction
of a sparse set of feature points is performed independently on both
images. The problem of feature correspondence (stereo matching or stereo
disparity) can be stated as finding pairs of features in two (or more)
perspective images such that each pair corresponds to the projections of
the same scene point. Stereo matching algorithms are based on the
similarity of features. The matching process uses the attributes associated
with the detected features. Once point correspondences are given, the transformation
that maps points in the second image to points in the first may be found.
At least four point correspondences are required to solve a linear system
with the eight unknowns that define the transformation matrix. The two
images are merged by creating a new image with the first image, and the
results of applying the obtained affine transformation to the points of the
second image. The full mosaic is obtained by performing this process for
each pair of images.
|
|


|
Face Detection
Automatic identity authentication may use a biometric
feature as input. If the biometric feature is a face, the authentication
process involves face detection and recognition. We consider a face
detection problem assuming the image includes a single face in a frontal
pose from a specific distance. Given such an image, the problem consists in
finding the rectangle R that
contains the face in the image. A pre-processing step computes a
probability distribution function Pl,
where Pl(x, y) represents the probability that pixel (x, y) is at a face
location. In order to compute Pl,
we need a training set of face images with the ground-truth rectangle where
each face is located. In general, the region of pixels with the maximum
face location probability is in the central part of the image. A rectangle R’ contained in this region is
assumed to have skin color. Initially, our algorithm computes a probability
distribution function Ps,
where Ps(x, y) represents the probability that pixel (x, y) has a skin
color. In order to compute Ps,
we build a histogram Hhue
for hue and a histogram Hsat
for saturation of pixels in the rectangle R’. For each pixel (x,
y) in the image, Ps(x, y) is proportional
to Hhue(h)*Hsat(s),
where h and s represent hue and saturation of the pixel (x, y), respectively. The probability Pf(x, y) that a pixel (x, y) belongs to a face is the product of the skin color
probability and location probability: Pf(x, y) = Ps(x, y)*Pl(x, y). Since the size of the face rectangle R is estimated from the frontal pose assumption, we detect a
face as the rectangle R with the
location that maximizes the sum of probability Pf for all pixels contained in R.
|
|


|
Traffic Surveillance
Given real-time images from
traffic cameras located throughout a metropolitan area, our initial
interest is the automatic detection of vehicles traveling on the
transportation network. This is a first step towards the identification of
vehicles, where each “different” detected vehicle is associated with a
unique identification number. Vehicle identification may lead to the
tracking of a vehicle in a single camera and among different cameras in the
network. A basic approach for the detection of moving objects on images is
background subtraction. Initially, an image background is constructed for
each camera using the corresponding input video. For each pixel location in
the background image, the color of the pixel is the median of the colors in
all frames of the video for the respective pixel location. Once an image
background is built, this image is subtracted from the original video in
order to detect moving objects. The assumption is that a moving object has
a different visual appearance from the constructed static background. The
input videos contain a fair share of pedestrians, bikers, luminous ads, and
other independent distractions. In order to eliminate the influence of such
visually dynamic aspects, we define a region of interest for each camera.
The background subtraction values are kept only in the region of interest.
The other values are assigned to a zero difference. With the background
subtraction updated according to the region of interest, motion pixels are
identified as the locations where absolute difference in appearance is
greater than a threshold value. After that, blobs possibly representing
cars are found as the connected components of the motion pixels in the
image. Components with a very small area are filtered out.
|
|


|
Proximity Problems on Polyhedral Surfaces
Shortest Path Planning is the field of Computational Geometry
that concerns the determination of feasible shortest paths from a point to
another in a given environment. We considered a Directed Frictioned
Geodesic (DFG) shortest path problem that minimizes the total work spent to
move a body on a polyhedral surface with constant friction coefficient and
constant slope in each face. In order to characterize geodesic paths and
shortest paths according to the constraints of the problem, we identified
the corresponding local optimality criterion and we demonstrated the strict
convexity of the DFG distance function using convex function theory. We
developed an algorithm, based on the continuous Dijkstra methodology, to
solve the DFG shortest path problem. We extended the DFG shortest path
problem to construct a shortest path Voronoi diagram on a polyhedral
surface according to the DFG distance function. We reduced some
generalizations of proximity problems to the construction of this diagram
and, therefore, this Voronoi diagram solved these proximity problems. We implemented
an external module to the Geomview program to visualize a shortest path
tree and a Voronoi diagram on polyhedral surfaces. The implementation of
shortest path algorithms on polyhedral surfaces used Geomview as a viewer
for dynamically changing geometric objects in Silicon Graphics. The
external module allowed selection of a vertex of the polyhedral surface as
the source point for a path. The control panel allows the user to choose
between a snapshot execution and a step-by-step animation execution of the
algorithm.
|
|


|
Raster to Vector Conversion
Implementation of raster to vector map conversion
using the Image Processing Toolbox of Matlab. Initially, the system
thresholds the map image concerning a particular set of colors in order to compute
a binary image representing roads with no other map features (e.g. text).
The binary image is smoothed to avoid noise by performing a two-dimensional
order statistic filtering. The Canny edge detector is used to find edge
elements (edgels) in the smoothed binary image. The system computes the
connected components of the edgels and vectorizes them by computing a line
according to the Hough transform. The endpoints of a line segment are found
using a proximity criterion and the edgels corresponding to this line
segment are removed. This process is repeated until a connected component
is empty.
|
|


|
Land Surveying
This project has a strong personal connotation. After
my grandparents passed away, my family needed to distribute the assets among
the heirs. The biggest challenge was the partition of a peace of land in
the northeast of Brazil.
I volunteered to measure the land with a GPS, to estimate its value, and to
subdivide the property in a “fair” manner. The measuring task took me two
days in tracking the property boundaries. I covered a 6Km distance of
Caatinga vegetation just in the first morning of surveying. The estimation
task had some issues. Market values are not easily found for properties in
that region. In general, land trading is not advertised on newspapers. The
value depends on the facilities the property includes, on the access to
roads, and mainly on the proximity to water resources. The subdivision task
had also some complications. The shape of the land is very irregular and
consists of disconnected parts. Due to trading among heirs, each person had
right to a different share of the land. Furthermore, the value of each land
acre is not the same across the property. Once GPS data was gathered, I
developed a system to display the property map with its boundaries, its
facilities, and landmarks. In order to estimate value, I used two base
prices for an acre of land. The highest base price is for an acre right
next to a road and water. The lowest base price is for an acre that is farthest
from roads and water. The distance to roads and water resources is used to
interpolate these two base prices in computing the value of any acre unit.
This way, I used a grid to subdivide the property and the value of each
cell in the grid is computed. The grid cells are assigned to heirs in an
interactive way. Each time a user clicks on a cell with the mouse, this
cell is assigned to the current heir in the system and the share value of
all heirs is printed. The current heir is changed through the keyboard.
Although the system does not solve the partition problem automatically, the
computer aids the user into designing a feasible and fair partition for the
land.
|
|