Springer, 2005, -377 p.
First International Workshop, MLMI 2004, Martigny, Switzerland, June 21-23, 2004. Revised Selected Papers.
The conference program featured invited talks, full papers (subject to careful peer review, by at least three reviewers), and posters (accepted on the basis of abstracts) covering a wide range of areas related to machine learning applied to multimodal interaction — and more specifically to multimodal meeting processing, as addressed by the M4, AMI and IM2 projects. These areas included: human-human communication modeling, speech and visual processing, multimodal processing, fusion and fission, multimodal dialog modeling, human-human interaction modeling, multimodal data structuring and presentation, multimedia indexing and retrieval, meeting structure analysis, meeting summarizing, multimodal meeting annotation, machine learning applied to the above.
I HCI and ApplicationsAccessing Multimodal Meeting Data: Systems, Problems and Possibilities
Browsing Recorded Meetings with Ferret
Meeting Modeling in the Context of Multimodal Research
Artificial Companions
Zakim – A Multimodal Software System for Large-Scale Teleconferencing
II Structuring and InteractionTowards Computer Understanding of Human Interactions
Multistream Dynamic Bayesian Network for Meeting Segmentation
Using Static Documents as Structured and Thematic Interfaces to Multimedia Meeting Archives
An Integrated Framework for the Management of Video Collection
The NITE XML Toolkit Meets the ICSI Meeting Corpus: Import, Annotation, and Browsing
III Multimodal ProcessingS-SEER: Selective Perception in a Multimodal Office Activity Recognition System
Mapping from Speech to Images Using Continuous State Space Models
An Online Algorithm for Hierarchical Phoneme Classification
Towards Predicting Optimal Fusion Candidates: A Case Study on Biometric Authentication Tasks
Mixture of SVMs for Face Class Modeling
AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking
IV Speech ProcessingThe 2004 ICSI-SRI-UW Meeting Recognition System
On the Adequacy of Baseform Pronunciations and Pronunciation Variants
Tandem Connectionist Feature Extraction for Conversational Speed Recognition
Long-Term Temporal Features for Conversational Speech Recognition
Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation
Speech Transcription and Spoken Document Retrieval in Finnish
A Mixed-Lingual Phonological Component Which Drives the Statistical Prosody Control of a Polyglot TTS Synthesis System
V Dialogue ManagementShallow Dialogue Processing Using Machine Learning Algorithms (or Not)
ARCHIVUS: A System for Accessing the Content of Recorded Multimodal Meetings
VI Vision and EmotionPiecing Together the Emotion Jigsaw
Emotion Analysis in Man-Machine Interaction Systems
A Hierarchical System for Recognition, Tracking and Pose Estimation
Automatic Pedestrian Tracking Using Discrete Choice Models and Image Correlation Techniques
A Shape Based, Viewpoint Invariant Local Descriptor