Series: Synthesis Lectures on Artificial Intelligence and Machine Learning.
— Morgan and Claypool Publishers, 2009. — 116 p. ISBN: 978-1598295474, e-ISBN: 978-1598295481.
Semi-supervised learning is a learning paradigm concerned with the study of how computers and natural systems such as humans learn in the presence of both labeled and unlabeled data. Traditionally, learning has been studied either in the unsupervised paradigm (e.g., clustering, outlier detection) where all the data is unlabeled, or in the supervised paradigm (e.g., classification, regression) where all the data is labeled. The goal of semi-supervised learning is to understand how combining labeled and unlabeled data may change the learning behavior, and design algorithms that take advantage of such a combination. Semi-supervised learning is of great interest in machine learning and data mining because it can use readily available unlabeled data to improve supervised learning tasks when the labeled data is scarce or expensive. Semi-supervised learning also shows potential as a quantitative tool to understand human category learning, where most of the input is self-evidently unlabeled. In this introductory book, we present some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi-supervised support vector machines. For each model, we discuss its basic mathematical formulation. The success of semi-supervised learning depends critically on some underlying assumptions. We emphasize the assumptions made by each model and give counterexamples when appropriate to demonstrate the limitations of the different models. In addition, we discuss semi-supervised learning for cognitive psychology. Finally, we give a computational learning theoretic perspective on semi-supervised learning, and we conclude the book with a brief discussion of open questions in the field.
Introduction to Statistical Machine LearningThe Data
Unsupervised Learning
Supervised Learning
Overview of Semi-Supervised LearningLearning from Both Labeled and Unlabeled Data
How is Semi-Supervised Learning Possible?
Inductive vs.Transductive Semi-Supervised Learning
Caveats
Self-Training Models
Mixture Models and EMMixture Models for Supervised Classification
Mixture Models for Semi-Supervised Classification
Optimization with the EM Algorithm
The Assumptions of Mixture Models
Other Issues in Generative Models
Cluster-then-Label Methods
Co-TrainingTwo Views of an Instance
Co-Training
The Assumptions of Co-Training
Multiview Learning
Graph-Based Semi-Supervised LearningUnlabeled Data as Stepping Stones
The Graph
Mincut
Function
Manifold Regularization
The Assumption of Graph-Based Methods
Semi-Supervised Support Vector MachinesSupport Vector Machines
Semi-Supervised Support Vector Machines
Entropy Regularization
The Assumption of S3VMs and Entropy Regularization
Human Semi-Supervised LearningFrom Machine Learning to Cognitive Science
Study One: Humans Learn from Unlabeled Test Data
Study Two: Presence of Human Semi-Supervised Learning in a Simple Task
Study Three: Absence of Human Semi-Supervised Learning in a Complex Task
Discussions
Theory and OutlookA Simple PAC Bound for Supervised Learning
A Simple PAC Bound for Semi-Supervised Learning
Future Directions of Semi-Supervised Learning
A: Basic Mathematical ReferenceB: Semi-Supervised Learning SoftwareC: Symbols