Thesis, Universität Karlsruhe, 2007, -296 p.
Ontologies and semantic metadata can theoretically solve all problems of traditional full-text search engines. In practice, however, ontologies and semantic metadata are always imperfect. They may miss facts, contain erroneous information, and sometimes even our knowledge of the world that should be represented in the ontology is imperfect. Moreover, the high complexity of ontology reasoning makes this technology hard to use in large-scale information retrieval (IR) systems, where performance is of paramount importance.
This work had pursued two goals. First, it provided techniques to decrease the negative effects of imperfection, and to make ontology reasoning applicable for large-scale IR. The provided solutions include among others a novel fuzzy temporal model, an IR system that combines full-text search with semantic search, and methods for completely automatic semantic metadata extraction from unstructured textual documents.
Second, this work analyzed whether the negative effect of the inherent ontology imperfection has a higher impact than the positive effect of exploiting the ontology features for IR. To answer this question, a complete ontology-based information retrieval system based on the previously devised solutions was implemented and thoroughly evaluated. During this evaluation, the retrieval performance of the new system was compared with a baseline full-text search engine.
The evaluation results show that even imperfect ontologies can dramatically increase the quality of results, if all ontology features are exploited, including ad-hoc, non-taxonomical relations, and temporal information.
Fundamentals
Problem analysis
State of the Art
Overview of my approach
Representing temporal imperfection
Ontology formalism
Metadata generation
Indexing and querying
Ontology development
User interface and implementation
Evaluation
Conclusion and Outlook
A Relational schema for vector space model performance testing
B Evaluation ontology
C Ontology-based heuristic rules