Springer Science+Business Media, 2009. — 786 p. — ISBN: 978-0387981345, e-ISBN: 978-0387981352.
This book is a thorough introduction to the most important topics in data mining and machine learning. It begins with a detailed review of classical function estimation and proceeds with chapters on nonlinear regression, classification, and ensemble methods. The final chapters focus on clustering, dimension reduction, variable selection, and multiple comparisons. All these topics have undergone extraordinarily rapid development in recent years and this treatment offers a modern perspective emphasizing the most recent contributions. The presentation of foundational results is detailed and includes many accessible proofs not readily available outside original sources. While the orientation is conceptual and theoretical, the main points are regularly reinforced by computational comparisons.
Intended primarily as a graduate level textbook for statistics, computer science, and electrical engineering students, this book assumes only a strong foundation in undergraduate statistics and mathematics, and facility with using R packages. The text has a wide variety of problems, many of an exploratory nature. There are numerous computed examples, complete with code, so that further computations can be carried out readily. The book also serves as a handbook for researchers who want a conceptual overview of the central topics in data mining and machine learning.
Variability, Information, and PredictionThe Curse of Dimensionality, The Two Extremes
Perspectives on the Curse: Sparsity, Exploding Numbers of Models, Multicollinearity and Concurvity, The Effect of Noise
Coping with the Curse: Selecting Design Points, Local Dimension, Parsimony
Two Techniques: The Bootstrap, Cross-Validation
Optimization and Search: Univariate Search, Multivariate Search, General Searches, Constraint Satisfaction and Combinatorial Search
Hammersley Points, Edgeworth Expansions for the Mean, Bootstrap Asymptotics for the Studentized Mean
Local SmoothersEarly Smoothers
Transition to Classical Smoothers: Global Versus Local Approximations, LOESS
Kernel Smoothers, Nearest Neighbors, Applications of Kernel Regression
Spline SmoothingInterpolating Splines, Natural Cubic Splines, Smoothing Splines for Regression
Asymptotic Bias, Variance, and MISE for Spline Smoothers
Splines Redux: Hilbert Space Formulation
Simulated Comparisons: What Happens with Dependent Noise Models?, Higher Dimensions and the Curse of Dimensionality
Sobolev Spaces: Definition
New Wave NonparametricsAdditive Models: The Backfitting Algorithm, Concurvity and Inference, Nonparametric Optimality
Generalized Additive Models
Projection Pursuit Regression
Neural Networks: Backpropagation and Inference, Barron’s Result and the Curse, Approximation Properties, Barron’s Theorem
Recursive Partitioning Regression: Growing Trees, Pruning and Selection, Regression, Bayesian Additive Regression Trees: BART,
MARS, Sliced Inverse Regression, ACE and AVAS, Proof of Barron’s Theorem
Supervised Learning: Partition MethodsMulticlass Learning
Discriminant Analysis: Distance-Based Discriminant Analysis, Bayes Rules, Probability-Based Discriminant Analysis
Tree-Based Classifiers: Splitting Rules, Logic Trees, Random Forests, Support Vector Machines, Neural Networks
Notes: Hoeffding’s Inequality, VC Dimension
Alternative NonparametricsEnsemble Methods: Bayes Model Averaging, Bagging, Stacking, Boosting, Other Averaging Methods, Oracle Inequalities
Bayes Nonparametrics: Dirichlet Process Priors, Polya Tree Priors, Gaussian Process Priors
The Relevance Vector Machine, Hidden Markov Models – Sequential Classification
Notes: Proof of Yang’s Oracle Inequality, Proof of Lecue’s Oracle Inequality
Computational ComparisonsComputational Results: Classification
Computational Results: Regression: Vapnik’s sinc Function, Friedman’s Function
Systematic Simulation Study, No Free Lunch
Unsupervised Learning: ClusteringCentroid-Based Clustering: K-Means Clustering, Variants
Hierarchical Clustering: Agglomerative Hierarchical Clustering, Divisive Hierarchical Clustering, Theory for Hierarchical Clustering
Partitional Clustering: Model-Based Clustering, Graph-Theoretic Clustering, Spectral Clustering
Bayesian Clustering: Probabilistic Clustering, Hypothesis Testing
Computed Examples: Ripley’s Data, Iris Data
Cluster Validation
Notes: Derivatives of Functions of a Matrix, Kruskal’s Algorithm: Proof, Prim’s Algorithm: Proof
Learning in High DimensionsPrincipal Components, Factor Analysis, Projection Pursuit, Independent Components Analysis, Nonlinear PCs and ICA
Geometric Summarization: Measuring Distances to an Algebraic Shape, Principal Curves and Surfaces
Supervised Dimension Reduction: Partial Least Squares
Supervised Dimension Reduction: Sufficient Dimensions in Regression
Visualization I: Basic Plots, Elementary Visualization, Projections, Time Dependence
Visualization II: Transformations, Chernoff Faces, Multidimensional Scaling, Self-OrganizingMaps
Variable SelectionConcepts from Linear Regression: Subset Selection, Variable Ranking, Overview
Traditional Criteria: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Choices of Information Criteria, Cross Validation
Shrinkage Methods: Shrinkage Methods for Linear Models, Grouping in Variable Selection,
Least Angle Regression, Shrinkage Methods for Model Classes, Cautionary Notes
Bayes Variable Selection
Computational Comparisons: The "n greater then p" Case; When p is greater then n
Multiple TestingAnalyzing the Hypothesis Testing Problem, Controlling the Familywise Error Rate, PCER and PFER,
Controlling the False Discovery Rate, Controlling the Positive False Discovery Rate, Bayesian Multiple Testing
Notes: Proof of the Benjamini-Hochberg Theorem, Proof of the Benjamini-Yekutieli Theorem