Springer, 2006, -482 p.
Third InternationalWorkshop, MLMI 2006, Bethesda, MD, USA, May 1-4, 2006. Revised Selected Papers.
The workshop was organized and sponsored jointly by the US National Institute for Standards and Technology (NIST), three projects supported by the European Commission (Information Society Technologies priority of the sixth Framework Programme)—the AMI and CHIL Integrated Projects, and the PASCAL Network of Excellence — and the Swiss National Science Foundation national research collaboration, IM2.
In addition to the main workshop, MLMI 2006 was co-located with the 4th NIST Meeting Recognition Workshop. This workshop was centered on the Rich Transcription 2006 Spring Meeting Recognition (RT-06) evaluation of speech technologies within the meeting domain. Building on the success of previous evaluations in this domain, the RT-06 evaluation continued evaluation tasks in the areas of speech-to-text, who-spoke-when, and speech activity detection.
The conference program featured invited talks, full papers (subject to careful peer review, by at least three reviewers), and posters (accepted on the basis of abstracts) covering a wide range of areas related to machine learning applied to multimodal interaction — and more specifically to multimodal meeting processing, as addressed by the various sponsoring projects. These areas included human–human communication modeling, speech and visual processing, multimodal processing, fusion and fission, human–computer interaction, and the modeling of discourse and dialog, with an emphasis on the application of machine learning.
I Invited PaperModel-Based, Multimodal Interaction in Document Browsing
II Multimodal ProcessingThe NIST Meeting Room Corpus 2 Phase 1
Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers
A Multimodal Analysis of Floor Control in Meetings
Combining User Modeling and Machine Learning to Predict Users’ Multimodal Integration Patterns
Using Audio, Visual, and Lexical Features in a Multi-modal Virtual Meeting Director
III Image and Video ProcessingA Study on Visual Focus of Attention Recognition from Head Pose in a Meeting Room
Multi-person Tracking in Meetings: A Comparative Study
Gaussian Mixture Models for CHASM Signature Verification
Kalman Tracking with Target Feedback on Adaptive Background Learning
Da Vinci’s Mona Lisa: A Modern Look at a Timeless Classic
IV HCI and ApplicationsThe Connector Service-Predicting Availability in Mobile Contexts
Multimodal Input for Meeting Browsing and Retrieval Interfaces: Preliminary Findings
V Discourse and DialogueGesture Features for Coreference Resolution
Syntactic Chunking Across Different Corpora
Multistream Recognition of Dialogue Acts in Meetings
Text Based Dialog Act Classification for Multiparty Meetings
Detecting Action Items in Multi-party Meetings: Annotation and Initial Experiments
Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site
VI Speech and Audio ProcessingA Speaker Localization System for Lecture Room Environment
Robust Speech Activity Detection in Interactive Smart-Room Environments
Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization
Speaker Diarization for Multi-microphone Meetings Using Only Between-Channel Differences
Warped and Warped-Twice MVDR Spectral Estimation With and Without Filterbanks
Robust Heteroscedastic Linear Discriminant Analysis and LCRC Posterior Features in Meeting Data Recognition
Juicer: A Weighted Finite-State Transducer Speech Decoder
Speech-to-Speech Translation Services for the Olympic Games 2008
VII NIST Meeting Recognition EvaluationThe Rich Transcription 2006 Spring Meeting Recognition Evaluation
The IBM RT06s Evaluation System for Speech Activity Detection in CHIL Seminars
A Lightweight Speech Detection System for Perceptive Environments
Robust Speaker Diarization for Meetings: ICSI RT06S Meetings Evaluation System
Technical Improvements of the E-HMM Based Speaker Diarization System for Meeting Records
The AMI Speaker Diarization System for NIST RT06s Meeting Data
The 2006 Athens Information Technology Speech Activity Detection and Speaker Diarization Systems
Speaker Diarization: From Broadcast News to Lectures
The ISL RT-06S Speech-to-Text System
The AMI Meeting Transcription System: Progress and Performance
The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings
The ICSI-SRI Spring 2006 Meeting Recognition System
The LIMSI RT06s Lecture Transcription System