Morgan Kaufmann, 2011. - 630 p. - ISBN: 0123748569 (Third Edition)
Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research.
*Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization.
Part I. Introduction to Data Mining.What’s It All About?Data Mining and Machine Learning,
Simple Examples: The Weather Problem and Others, Fielded Applications,
Machine Learning and Statistics, Generalization as Search, Data Mining and Ethics.
Input: Concepts, Instances, and Attributes.
What’s a Concept?, What’s in an Example?, What’s in an Attribute?, Preparing the Input.
Output: Knowledge Representation.
Tables, Linear Models, Trees, Rules, Instance-Based Representation, Clusters.
Algorithms: The Basic Methods.
Inferring Rudimentary Rules, Statistical Modeling, Divide-and-Conquer: Constructing Decision Trees,
Covering Algorithms: Constructing Rules, Mining Association Rules, Linear Models,
Instance-Based Learning, Clustering, Multi-Instance Learning.
Credibility: Evaluating What’s Been Learned.
Training and Testing, Predicting Performance, Cross-Validation Other Estimates
Leave-One-Out Cross-Validation, The Bootstrap, Comparing Data Mining Schemes, Predicting Probabilities,
Counting the Cost, Evaluating Numeric Prediction, Minimum Description Length Principle, Applying the MDL Principle to Clustering.
Part II. Advanced Data Mining.Implementations: Real Machine Learning Schemes.
Decision Trees, Classification Rules, Association Rules, Extending Linear Models, Instance-Based Learning,
Numeric Prediction with Local Linear Models, Bayesian Networks, Clustering, Semisupervised Learning, Multi-Instance Learning.
Data Transformations.
Attribute Selection, Discretizing Numeric Attributes, Projections, Sampling, Cleansing,
Transforming Multiple Classes to Binary Ones, Calibrating Class Probabilities.
Ensemble Learning.
Combining Multiple Models, Bagging, Randomization, Boosting, Additive Regression, Interpretable Ensembles, Stacking.
Moving on: Applications and Beyond.
Applying Data Mining, Learning from Massive Datasets, Data Stream Learning, Incorporating Domain Knowledge,
Text Mining, Web Mining, Adversarial Situations, Ubiquitous Data Mining.
Part III. The Weka Data Mining Workbench.Introduction to Weka.
What’s in Weka?, How Do You Use It?, What Else Can You Do?, How Do You Get It?
The Explorer.
Getting Started, Exploring the Explorer, Filtering Algorithms, Learning Algorithms, Metalearning Algorithms,
Clustering Algorithms, Association-Rule Learners, Attribute Selection.
The Knowledge Flow Interface.
Getting Started, Components, Configuring and Connecting the Components, Incremental Learning.
The Experimenter.
Getting Started, Simple Setup, Advanced Setup, The Analyze Panel, Distributing Processing over Several Machines.
The Command-Line Interface.
Getting Started, The Structure of Weka, Command-Line Options.
Embedded Machine Learning.
A Simple Data Mining Application.
Writing New Learning Schemes.
An Example Classifier, Conventions for Implementing Classifiers.
Tutorial Exercises for the Weka Explorer.
Introduction to the Explorer Interface, Nearest-Neighbor Learning and Decision Trees, Classification Boundaries,
Preprocessing and Parameter Tuning, Document Classification, Mining Association Rules.