University of Oulu

Document image retrieval with improvements in database quality

Saved in:
Author: Kauniskangas, Hannu1
Organizations: 1University of Oulu, Faculty of Technology, Department of Electrical Engineering
Format: ebook
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 1.4 MB)
Persistent link:
Language: English
Published: 1999
Publish Date: 1999-06-23
Thesis type: Doctoral Dissertation
Defence Note: Academic Dissertation to be presented with the assent of the Faculty of Technology, University of Oulu, for public discussion in Raahensali (Auditorium L 10), Linnanmaa, on August 17th, 1999, at 12 noon.
Reviewer: Doctor Omid E. Kia
Professor Pasi Koikkalainen


Modern technology has made it possible to produce, process, transmit and store digital images efficiently. Consequently, the amount of visual information is increasing at an accelerating rate in many diverse application areas. To fully exploit this new content-based image retrieval techniques are required. Document image retrieval systems can be utilized in many organizations which are using document image databases extensively.

This thesis presents document image retrieval techniques and new approaches to improve database content. The goal of the thesis is to develop a functional retrieval system and to demonstrate that better retrieval results can be achieved with the proposed database generation methods.

Retrieval system architecture, a document data model, and tools for querying document image databases are introduced. The retrieval framework presented allows users to interactively define, construct and combine queries using document or image properties: physical (structural), semantic, textual and visual image content. A technique for combining primitive features like color, shape and texture into composite features is presented. A novel search base reduction technique which uses structural and content properties of documents is proposed for speeding up the query process.

A new model for database generation within the image retrieval system is presented. An approach for automated document image defect detection and management is presented to build high quality and retrievable database objects. In image database population, image feature profiles and their attributes are manipulated automatically to better match with query requirements determined by the available query methods, the application environment and the user.

Experiments were performed with multiple image databases containing over one thousand images. They comprised a range of document and scene images from different categories, properties and condition. The results show that better recall and accuracy for retrieval is achieved with the proposed optimization techniques. The search base reduction technique results in a considerable speed-up in overall query processing. The constructed document image retrieval system performs well in different retrieval scenarios and provides a consistent basis for algorithm development. The proposed modular system structure and interfaces facilitate its usage in a wide variety of document image retrieval applications.

see all

Series: Acta Universitatis Ouluensis. C, Technica
ISSN-E: 1796-2226
ISBN: 951-42-5313-2
Issue: 140
Copyright information: © University of Oulu, 1999. This publication is copyrighted. You may download, display and print it for your own personal use. Commercial use is prohibited.