Springer, 2002, -118 p.
This volume is based on a workshop held on September 13, 2001 in New Orleans, LA, USA as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. The title of the workshop was: Information Retrieval Techniques for Speech Applications.
Interest in speech applications dates back a number of decades. However, it is only in the last few years that automatic speech recognition has left the confines of the basic research lab and become a viable commercial application. Speech recognition technology has now matured to the point where speech can be used to interact with automated phone systems, control computer programs, and even create memos and documents. Moving beyond computer control and dictation, speech recognition has the potential to dramatically change the way we create, capture, and store knowledge. Advances in speech recognition technology combined with ever decreasing storage costs and processors that double in power every eighteen months have set the stage for a whole new era of applications that treat speech in the same way that we currently treat text.
The goal of this workshop was to explore the technical issues involved in applying information retrieval and text analysis technologies in the new application domains enabled by automatic speech recognition. These possibilities bring with them a number of issues, questions, and problems. Speech-based user interfaces create different expectations for the end user, which in turn places different demands on the back-end systems that must interact with the user and interpret the user’s commands. Speech recognition will never be perfect, so analyses applied to the resulting transcripts must be robust in the face of recognition errors.
The ability to capture speech and apply speech recognition on smaller, more powerful, pervasive devices suggests that text analysis and mining technologies can be applied in new domains never before considered.
This workshop explored techniques in information retrieval and text analysis that meet the challenges in the new application domains enabled by automatic speech recognition.W e posed seven questions to the contributors to focus on:
What new IR related applications, problems, or opportunities are created by effective, real-time speech recognition?
To what extent are information retrieval methods that work on perfect text applicable to imperfect speech transcripts?
What additional data representations from a speech engine may be exploited by applications?
Does domain knowledge (context/voice-id) help and can it be automatically deduced?
Can some of the techniques explored be beneficial in a standard IR application?
What constraints are imposed by real time speech applications?
Case studies of specific speech applications – either successful or not.
Perspectives on Information Retrieval and Speech
Capitalization Recovery for Text
Clustering of Imperfect Transcripts Using a Novel Similarity Measure
Extracting Keyphrases from Spoken Audio Documents
Segmenting Conversations by Topic, Initiative, and Style
Extracting Caller Information from Voicemail
Speech and Hand Transcribed Retrieval
The Use of Speech Retrieval Systems: A Study Design
Speech-Driven Text Retrieval: Using Target IR Collections for Statistical Language Model Adaptation in Speech Recognition
WASABI: Framework for Real-Time Speech Analysis Applications (Demo)