University of Oulu

Scene understanding through semantic image segmentation in augmented reality

Saved in:
Author: Türkmen, Sercan1
Organizations: 1University of Oulu, Faculty of Information Technology and Electrical Engineering, Department of Computer Science and Engineering, Computer Science and Engineering
Format: ebook
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 7.1 MB)
Pages: 64
Persistent link:
Language: English
Published: Oulu : S. Türkmen, 2019
Publish Date: 2019-06-26
Thesis type: Master's thesis (tech)
Tutor: Heikkilä, Janne
Reviewer: Heikkilä, Janne
Pedone, Matteo


Semantic image segmentation, the task of assigning a label to each pixel in an image, is a major challenge in the field of computer vision. Semantic image segmentation using fully convolutional neural networks (FCNNs) offers an online solution to the scene understanding while having a simple training procedure and fast inference speed if designed efficiently. The semantic information provided by the semantic segmentation is a detailed understanding of the current context and this scene understanding is vital for scene modification in augmented reality (AR), especially if one aims to perform destructive scene augmentation. Augmented reality systems, by nature, aim to have a real-time modification of the context through head-mounted see-through or video-see-through displays, thus require efficiency in each step. Although there are many solutions to the semantic image segmentation in the literature such as DeeplabV3+, Deeplab DPC, they fail to offer a low latency inference due to their complex architectures in aim to acquire the best accuracy. As a part of this thesis work, we provide an efficient architecture for semantic image segmentation using an FCNN model and achieve real-time performance on smartphones at 19.65 frames per second (fps) while maintaining a high mean intersection over union (mIOU) of 67.7% on Cityscapes validation set with our "Basic" variant and 15.41 fps and 70.3% mIOU on Cityscapes test set using our "DPC" variant. The implementation is open-sourced and compatible with Tensorflow Lite, thus able to run on embedded and mobile devices.

Furthermore, the thesis work demonstrates an augmented reality implementation where semantic segmentation masks are tracked online in a 3D environment using Google ARCore. We show that the frequent calculation of semantic information is not necessary and by tracking the calculated semantic information in 3D space using inertial-visual odometry that is provided by the ARCore framework, we can achieve savings on battery and CPU usage while maintaining a high mIOU. We further demonstrate a possible use case of the system by inpainting the objects in 3D space that are found by the semantic image segmentation network. The implemented Android application performs real-time augmented reality at 30 fps while running the computationally efficient network that was proposed as a part of this thesis work in parallel.

see all

Copyright information: © Sercan Türkmen, 2019. This publication is copyrighted. You may download, display and print it for your own personal use. Commercial use is prohibited.