The SWAN Project: video-based localization for the visually impaired
Visual perception of landmarks in the world around us is crucial in many aspects of daily life. Its importance ranges
from simple scene understanding tasks, up to planning some commands for navigation. While all of the above capabilities
can be considered easy tasks, they can become a great challenge for the visually impaired. There are also many situations
where even those who are not permanently visually impaired cannot use vision for navigation. In some cases, such a Navy SEAL
navigating underwater or firefighters in a smoke filled environment, vision may not be useful due to environmental
constraints (darkness, smoke, etc.). Individuals in some of the above situations can be faced with serious, possible
It is therefore highly important to develop a wearable automatic system that communicates a range of
information about the surrounding environment in a non-visual-manner, to allow a person greater knowledge,
connection to, and more effective navigation through space. While there has been a great deal of research
in the area of electronic travel aids for obstacle avoidance, there has not been comparable research in the
development of orientation devices that keep one apprised of 6 d.o.f. information on both location and heading.
Description and Goals
The SWAN project is an NSF funded project officially started in 2004
at the Georgia Institute of Technology by Prof. Bruce Walker (Dept. of Psychology) in collaboration with
Prof. Frank Dellaert (College of Computing). In it's last year (from December 2007 up until October 2008)
I joined Prof. Dellaert’s group as a PostDoctoral fellow to work on localization with vision sensors
(e.g., wearable multi-camera rig).
Figure 1: (Left) The original hardware setup for vision-based localization; (Right)
Typical output from the FPGA camera during the camera calibration phase with a calibration checkboard.
Previous software implementation required a time-consuming correspondence matching phase between the SIFT descriptor of the 3-D map and the current image (to be localized). This was not possible to implement in a practical scenario, since the FPGA camera developed at the Borg Lab for the SWAN project (Fig.1) was only able to detect (and transmit to the PC) corner features.
Moreover, other hardware issues with the FPGA camera (such as speed -5 fps-, serial transmission to the PC of a limited number of (point) features detected, and no access to the real image -impossibility in correct image focus and difficulties to calibrate it-) prompted us to change the hardware.
We used functional programming as a programming language for the vision-based localization in the SWAN project. In particular, we made use of Ocaml jointly with a proprietary ML library developed at the Borg Lab by Frank Dellaert and his collaborators. This library implements algebra, data structures and multi-view geometry.
Figure 2: (Left) The result of 3-D reconstruction from sparse images; (Right)
The descriptorless localization algorithm (1 image), after some iterations starting from a guessed initial camera pose, in the ideal case of exact one-to-one map-image matches.
As a first step, I participated with graduate student Kai Ni to the development and testing of the 3-D reconstruction engine. In Fig. 2 you can see some results obtained from a sparse set of pictures taken from a high quality Canon camera.
Due to the above description, we decided to implement a (maximum-likelihood - ML) pose estimation algorithm which rendered the SIFT descriptors useless: for this, we call it descriptorless ML vision-based localization. Additional descriptions of the algorithm are available upon request.
Also, we wanted to make the algorithm capable of running in real time, using a real reconstructed 3-D map with thousand of points, against a real scenario (only hundred of detected image features). RANSAC-based localization has been implemented has the core (robust to outliers) algorithm. A result of this algorithm for the FPGA camera version can be observed in Fig. 3
Figure 3: The localization with the FPGA camera in a very dense 3D-map (black path). The results are quite unstable and the poor Harris corner detector made the estimated pose drift unexpectedly.
When dealing with a camera motion in an (possibly wide) environment, there might be the need to be able to reacquire some
temporarily lost features that reappear in the field of view. As an example, this may occur even when a camera translates
far away from the scene so that some corner detector is no longer able to detect the same corners in the current image
(red crosses), even if they are present in the map (yellow crosses). This may result in a constantly decreasing number
of correspondences, thus leading to poor localization and, at the end, to stop the algorithm when a minimum number of
features is not anymore detected.
I have implemented a descriptorless resurrection algorithm that uses only the most significant image corners for the localization and, as a consequence, keeps an almost constant number of map-image corner matches during the camera motion.
As a result, the localization performances greatly improved.
See some results in Fig. 4.
A high-resolution version of the video can be found here.
A detailed description of the localization results can be found in these slides.
Figure 4: (Left) The video of the camera localization algorithm experiment. (Right)
Top-view of the reconstructed camera path (black) together with the 3D map (green).
In this webpage I wanted to describe my contribution to the SWAN project in the one-year appointment (actually 10 months!) at the Georgia Institute of Technology. Again, this is just an integration of my intellectual contribution to the official SWAN webpage.
When I left Georgia Tech to join the University of Minnesota, Frank Dellaert wanted to give me, on his webpage, his warm greetings... Grazie Frank!
I also want to acknowledge here also the crucial contribution of the whole SWAN project team, in particular my lab-mates Kai Ni and Sang Min Oh.
File translated from
version 3.85. On 7 Jan 2009, 09:30.