Document image retrieval with improvements in database quality
1University of Oulu, Faculty of Technology, Department of Electrical Engineering
|Online Access:||PDF Full Text (PDF, 1.4 MB)|
|Persistent link:|| http://urn.fi/urn:isbn:9514253132
|Publish Date:|| 1999-06-23
|Thesis type:||Doctoral Dissertation
|Defence Note:||Academic Dissertation to be presented with the assent of the Faculty of Technology, University of Oulu, for public discussion in Raahensali (Auditorium L 10), Linnanmaa, on August 17th, 1999, at 12 noon.
Doctor Omid E. Kia
Professor Pasi Koikkalainen
Modern technology has made it possible to produce, process, transmit and store digital images efficiently. Consequently, the amount of visual information is increasing at an accelerating rate in many diverse application areas. To fully exploit this new content-based image retrieval techniques are required. Document image retrieval systems can be utilized in many organizations which are using document image databases extensively.
This thesis presents document image retrieval techniques and new approaches to improve database content. The goal of the thesis is to develop a functional retrieval system and to demonstrate that better retrieval results can be achieved with the proposed database generation methods.
Retrieval system architecture, a document data model, and tools for querying document image databases are introduced. The retrieval framework presented allows users to interactively define, construct and combine queries using document or image properties: physical (structural), semantic, textual and visual image content. A technique for combining primitive features like color, shape and texture into composite features is presented. A novel search base reduction technique which uses structural and content properties of documents is proposed for speeding up the query process.
A new model for database generation within the image retrieval system is presented. An approach for automated document image defect detection and management is presented to build high quality and retrievable database objects. In image database population, image feature profiles and their attributes are manipulated automatically to better match with query requirements determined by the available query methods, the application environment and the user.
Experiments were performed with multiple image databases containing over one thousand images. They comprised a range of document and scene images from different categories, properties and condition. The results show that better recall and accuracy for retrieval is achieved with the proposed optimization techniques. The search base reduction technique results in a considerable speed-up in overall query processing. The constructed document image retrieval system performs well in different retrieval scenarios and provides a consistent basis for algorithm development. The proposed modular system structure and interfaces facilitate its usage in a wide variety of document image retrieval applications.
Acta Universitatis Ouluensis. C, Technica
© University of Oulu, 1999. This publication is copyrighted. You may download, display and print it for your own personal use. Commercial use is prohibited.