University of Oulu

Structure-from-motion using convolutional neural networks

Saved in:
Author: Huynh, Lam1
Organizations: 1University of Oulu, Faculty of Information Technology and Electrical Engineering, Department of Computer Science and Engineering, Computer Science and Engineering
Format: ebook
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 11.8 MB)
Pages: 63
Persistent link:
Language: English
Published: Oulu : L. Huynh, 2018
Publish Date: 2018-09-06
Thesis type: Master's thesis (tech)
Tutor: Heikkilä, Janne
Ylimäki, Markus
Reviewer: Heikkilä, Janne
Ylimäki, Markus


There is an increasing interest in the research community to 3D scene reconstruction from monocular RGB cameras. Conventionally, structure from motion or special hardware such as depth sensors or LIDAR systems were used to reconstruct the point clouds of complex scenes. However, structure from motion technique usually fails to create the dense point cloud, while particular sensors are inconvenient and more expensive than RGB cameras. Recent advances in deep learning research have presented remarkable results in many computer vision tasks. Nevertheless, complete solution for large-scale dense 3D point cloud reconstruction still remains untouched.

This thesis introduces a deep-learning-based structure-from-motion pipeline for the dense 3D scene reconstruction problem. Several deep neural networks models were trained to predict the single view depth maps, and relative camera poses from RGB video frames. First, the obtained depth values were sequentially scaled to the first depth map. Next, the iterative closest point algorithm was utilized to further align the estimated camera poses. From these two processed cues, the point clouds of the scene were reconstructed by simple concatenation of 3D points.

Although the final point cloud results are encouraging and in certain aspects preferable to the conventional structure from motion method, the system is just tackling the 3D reconstruction problem to some extent. The prediction outputs still have errors, especially in the camera orientation estimation. This system can be seen as the initial study that opens up lots of research questions and improvements in the future. Besides, the study also signified the positive intimation for using unsupervised deep learning scheme to address the 3D scene reconstruction task.

see all

Copyright information: © Lam Huynh, 2018. This publication is copyrighted. You may download, display and print it for your own personal use. Commercial use is prohibited.