University of Oulu

Modeling of structured 3-D environments from monocular image sequences

Saved in:
Author: Repo, Tapio1,2
Organizations: 1University of Oulu, Faculty of Technology, Department of Electrical and Information Engineering
2University of Oulu, Infotech Oulu
Format: ebook
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 2.7 MB)
Persistent link:
Language: English
Published: 2002
Publish Date: 2002-11-08
Thesis type: Doctoral Dissertation
Defence Note: Academic Dissertation to be presented with the assent of the Faculty of Technology, University of Oulu, for public discussion in Kuusamonsali (Auditorium YB 210), Linnanmaa, on November 8th, 2002, at 12 noon.
Reviewer: Doctor Atte Kortekangas
Doctor Antti Ylä-Jääski


The purpose of this research has been to show with applications that polyhedral scenes can be modeled in real time with a single video camera. Sometimes this can be done very efficiently without any special image processing hardware. The developed vision sensor estimates its three-dimensional position with respect to the environment and models it simultaneously. Estimates become recursively more accurate when objects are approached and observed from different viewpoints.

The modeling process starts by extracting interesting tokens, like lines and corners, from the first image. Those features are then tracked in subsequent image frames. Also some previously taught patterns can be used in tracking. A few features in the same image are extracted. By this way the processing can be done at a video frame rate. New features appearing can also be added to the environment structure.

Kalman filtering is used in estimation. The parameters in motion estimation are location and orientation and their first derivates. The environment is considered a rigid object in respect to the camera. The environment structure consists of 3-D coordinates of the tracked features. The initial model lacks depth information. The relational depth is obtained by utilizing facts such as closer points move faster on the image plane than more distant ones during translational motion. Additional information is needed to obtain absolute coordinates.

Special attention has been paid to modeling uncertainties. Measurements with high uncertainty get less weight when updating the motion and environment model. The rigidity assumption is utilized by using shapes of a thin pencil for initial model structure uncertainties. By observing continuously motion uncertainties, the performance of the modeler can be monitored.

In contrast to the usual solution, the estimations are done in separate state vectors, which allows motion and 3-D structure to be estimated asynchronously. In addition to having a more distributed solution, this technique provides an efficient failure detection mechanism. Several trackers can estimate motion simultaneously, and only those with the most confident estimates are allowed to update the common environment model.

Tests showed that motion with six degrees of freedom can be estimated in an unknown environment. The 3-D structure of the environment is estimated simultaneously. The achieved accuracies were millimeters at a distance of 1–2 meters, when simple toy-scenes and more demanding industrial pallet scenes were used in tests. This is enough to manipulate objects when the modeler is used to offer visual feedback.

see all

Series: Acta Universitatis Ouluensis. C, Technica
ISSN-E: 1796-2226
ISBN: 951-42-6857-1
ISBN Print: 951-42-6856-3
Issue: 174
Copyright information: © University of Oulu, 2002. This publication is copyrighted. You may download, display and print it for your own personal use. Commercial use is prohibited.