Oxford: Oxford University Press, 2012. — 278 p. — ISBN: 0199767106.
An introduction to statistical data mining, Data Analysis and Data Mining is both textbook and professional resource. Assuming only a basic knowledge of statistical reasoning, it presents core concepts in data mining and exploratory statistical models to students and professional statisticians-both those working in communications and those working in a technological or scientific capacity-who have a limited knowledge of data mining.
This book presents key statistical concepts by way of case studies, giving readers the benefit of learning from real problems and real data. Aided by a diverse range of statistical methods and techniques, readers will move from simple problems to complex problems. Through these case studies, authors Adelchi Azzalini and Bruno Scarpa explain exactly how statistical methods work; rather than relying on the "push the button" philosophy, they demonstrate how to use statistical tools to find the best solution to any given problem.
Case studies feature current topics highly relevant to data mining, such web page traffic; the segmentation of customers; selection of customers for direct mail commercial campaigns; fraud detection; and measurements of customer satisfaction. Appropriate for both advanced undergraduate and graduate students, this much-needed book will fill a gap between higher level books, which emphasize technical explanations, and lower level books, which assume no prior knowledge and do not explain the methodology behind the statistical operations.
New problems and new opportunities
All models are wrong
A matter of style
A–B–COld friends: Linear models
Computational aspects
Likelihood
Logistic regression and GLM
Exercises
Optimism, Conflicts, and Trade-offsMatching the conceptual frame and real life
A simple prototype problem
If we knew f (x)...
But as we do not know f (x)...
Methods for model selection
Reduction of dimensions and selection of most appropriate model
Exercises
Prediction of Quantitative VariablesNonparametric estimation: Why?
Local regression
The curse of dimensionality
Splines
Additive models and GAM
Projection pursuit
Inferential aspects
Regression trees
Neural networks
Case studies
Exercises
Methods of ClassificationPrediction of categorical variables
An introduction based on a marketing problem
Extension to several categories
Classification via linear regression
Discriminant analysis
Some nonparametric methods
Classification trees
Some other topics
Combination of classifiers
Case studies
Exercises
Methods of Internal AnalysisCluster analysis
Associations among variables
Case study: Web usage mining
Appendix A Complements of Mathematics and StatisticsConcepts on linear algebra
Concepts of probability theory
Concepts of linear models
Appendix B Data SetsSimulated data
Car data
Brazilian bank data
Data for telephone company customers
Insurance data
Choice of fruit juice data
Customer satisfaction
Web usage data
Appendix C Symbols and Acronyms