Springer, 2007, -319 p.
Recent years have seen enormous advances in computerization and digitization as well as a corresponding growth in the use of information technology allowing users to access and experience multimedia content on an unprecedented scale. In this context, great efforts have been directed toward the development of techniques for searching and extracting useful information from huge amounts of stored data. In particular for textual information, powerful search engines have been implemented that provide efficient browsing and retrieval within billions of textual documents. For other types of multimedia data such as music, image, video, 3D shape, or 3D motion data, traditional retrieval strategies rely on textual annotations or metadata attached to the documents. Since the manual generation of descriptive labels is infeasible for large datasets, one needs fully automated procedures for data annotation as well as efficient content-based retrieval methods that only access the raw data itself without relying on the availability of annotations. A general retrieval scenario, which has attracted a large amount of attention in the field of multimedia information retrieval, is based on the query-by-example paradigm: given a query in form of a data fragment, the task is to automatically retrieve all documents from the database containing parts or aspects similar to the query.
Here, the notion of similarity, which strongly depends on the respective application or on a person’s perception, is of crucial importance in comparing the data. Frequently, multimedia objects, even though similar from a structural or semantic point of view, may reveal significant spatial or temporal differences. This makes content-based multimedia retrieval a challenging research field with many yet unsolved problems.
The present monograph introduces concepts and algorithms for robust and efficient information retrieval by means of two different types of multimedia data: waveform-based music data and human motion data. For both domains, music and motion, semantically related objects typically exhibit a large range of variations concerning temporal, spatial, spectral, or dynamic properties. In this book, we will study fundamental strategies for handling object deformations and variability in the given data with a view to real-world retrieval and browsing applications. Here, one important principle, which is applicable to general multimedia data, is to already absorb variations that are to be left unconsidered in the searching process at the feature level. This strategy makes it possible to use relatively strict and efficient matching techniques.
According to the two types of multimedia data to be considered, this monograph is organized in two parts. In Part I, we will discuss in depth several current problems in music information retrieval. In particular, we describe general strategies as well as efficient algorithms for music synchronization, audio matching, and audio structure analysis. We also show how the analysis results can be used in an advanced audio player to facilitate additional retrieval and browsing functionality. Then, in Part II, we will systematically introduce a general and unified framework for motion analysis, retrieval, and classification. Here, important aspects concern the design of suitable features, the notion of similarity used to compare data streams, as well as data organization. Even though conceptually interrelated, the two parts of this monograph are kept independent, each giving a self-contained account of recent advances in information retrieval for the respective multimedia domain. Both parts have been organized in didactically prepared units: they start with introductory chapters covering the fundamentals required for the subsequent chapters and then present scientific contributions of the author. The detailed chapters at the beginning of each part give consideration to the interdisciplinary character of this work. Here, we also fix the notation, introduce a precise terminology, and supply rigorous mathematical foundations. In this monograph, we will encounter aspects from a multitude of research fields including information science, digital signal processing, audio engineering, musicology, and computer graphics.
This monograph is accessible to a wide audience, from students at the graduate level and lecturers to practitioners and scientists working in the above-mentioned research fields. Each part is suitable for use as stand-alone lecture notes for a graduate course in Computer Science. Here, the focus is on the study of fundamental algorithms and concepts for the analysis, classification, indexing, and retrieval of time-dependent data in the context of a specific multimedia domain. Important aspects concern the design of suitable features, the development of local and global similarity measures, as well as data organization. The general goal of the monograph is to highlight the interaction between modeling, experimentation, and mathematical theory while introducing the students to current research fields. Dividing the results into essentially independent chapters and including suitable recapitulations should allow a researcher to read chapters or individual sections of this monograph as self-contained units. Further notes including references to the literature are provided at the end of each chapter. Motivating and domain-specific introductions of this monograph can be found in Chap. 1.
Part I Analysis and Retrieval Techniques for Music DataFundamentals on Music and Audio Data
Pitch- and Chroma-Based Audio Features
Dynamic Time Warping
Music Synchronization
Audio Matching
Audio Structure Analysis
SyncPlayer: An Advanced Audio Player
Part II Analysis and Retrieval Techniques for Motion DataFundamentals on Motion Capture Data
DTW-Based Motion Comparison and Retrieval
Relational Features and Adaptive Segmentation
Index-Based Motion Retrieval
Motion Templates
MT-Based Motion Annotation and Retrieval