Academic Press, 2013. — 232 p. — ISBN: 0123969638, 9780123969637
R and Data Mining introduces researchers, post-graduate students, and analysts to data mining using R, a free software environment for statistical computing and graphics. The book provides practical methods for using R in applications from academia to industry to extract knowledge from vast amounts of data. Readers will find this book a valuable guide to the use of R in tasks such as classification and prediction, clustering, outlier detection, association rules, sequence analysis, text mining, social network analysis, sentiment analysis, and more.
Data mining techniques are growing in popularity in a broad range of areas, from banking to insurance, retail, telecom, medicine, research, and government. This book focuses on the modeling phase of the data mining process, also addressing data exploration and model evaluation.
With three in-depth case studies, a quick reference guide, bibliography, and links to a wealth of online resources, R and Data Mining is a valuable, practical guide to a powerful method of analysis.
Presents an introduction into using R for data mining applications, covering most popular data mining techniques
Provides code examples and data so that readers can easily learn the techniques
Features case studies in real-world applications to help readers apply the techniques in their work
Dr. Yanchang Zhao is a Senior Data Mining Specialist in Australian public sector. Before joining public sector, he was an Australian Postdoctoral Fellow (Industry) at University of Technology, Sydney from 2007 to 2009. He is the founder of the RDataMining.com website and an RDataMining Group on LinkedIn. He has rich experience in R and data mining. He started his research on data mining since 2001 and has been applying data mining in real-world business applications since 2006. He has over 50 publications on data mining research and applications, including three books. He is a senior member of IEEE, and has been a Program Chair of the Australasian Data Mining Conference (AusDM 2012 & 2013) and a program committee member for more than 50 academic conferences.
Data Mining
R
Datasets
Data Import and ExportSave and Load R Data
Import from and Export to .CSV Files
Import Data from SAS
Import/Export via ODBC
Data ExplorationHave a Look at Data
Explore Individual Variables
Explore Multiple Variables
More Explorations
Save Charts into Files
Decision Trees and Random ForestDecision Trees with Package party
Decision Trees with Package rpart
Random Forest
RegressionLinear Regression
Logistic Regression
Generalized Linear Regression
Non-linear Regression
ClusteringThe k-Means Clustering
The k-Medoids Clustering
Hierarchical Clustering
Density-based Clustering
Outlier DetectionUnivariate Outlier Detection
Outlier Detection with LOF
Outlier Detection by Clustering
Outlier Detection from Time Series
Discussions
Time Series Analysis and MiningTime Series Data in R
Time Series Decomposition
Time Series Forecasting
Time Series Clustering
Time Series Classification 8.6 Discussions
Further Readings
Association RulesBasics of Association Rules
The Titanic Dataset
Association Rule Mining
Removing Redundancy
Interpreting Rules
Visualizing Association Rules
Discussions and Further Readings
Text MiningRetrieving Text from Twitter
Transforming Text
Stemming Words
Building a Term-Document Matrix
Frequent Terms and Associations
Word Cloud
Clustering Words
Clustering Tweets
Packages, Further Readings and Discussions
Social Network AnalysisNetwork of Terms
Network of Tweets
Two-Mode Network
Discussions and Further Readings
Case Study I: Analysis and Forecasting of House Price IndicesImporting HPI Data
Exploration of HPI Data
Trend and Seasonal Components of HPI
HPI Forecasting
The Estimated Price of a Property
Discussion
Case Study II: Customer Response Prediction and Profit OptimizationThe Data of KDD Cup 1998
Data Exploration
Training Decision Trees
Model Evaluation
Selecting the Best Tree
Scoring
Discussions and Conclusions
Case Study III: Predictive Modeling of Big Data with Limited MemoryMethodology
Data and Variables
Random Forest
Memory Issue
Train Models on Sample Data
Build Models with Selected Variables
Scoring
Print Rules
Conclusions and Discussion
Online ResourcesR Reference Cards
R
Data Mining
Data Mining with R
Classification/Prediction with R
Time Series Analysis with R
Association Rule Mining with R
Spatial Data Analysis with R
Text Mining with R
Social Network Analysis with R
Data Cleansing and Transformation with R
Big Data and Parallel Computing with R