Sign up
Forgot password?
FAQ: Login

Suthaharan S. Machine Learning Models and Algorithms for Big Data Classification. Thinking with Examples for Effective Learning

  • pdf file
  • size 5,02 MB
  • added by
  • info modified
Suthaharan S. Machine Learning Models and Algorithms for Big Data Classification. Thinking with Examples for Effective Learning
Springer, 2016. — 364 p.
Data science is one of the emerging fields in the twenty-first century. This field has been created to address the big data problems encountered in the day-to-day operations of many industries, including financial sectors, academic institutions, information technology divisions, health care companies, and government organizations. One of the important big data problems that needs immediate attention is in big data classifications. The network intrusion detection, public space intruder detection, fraud detection, spam filtering, and forensic linguistics are some of the practical examples of big data classification problems that require immediate attention.
We need significant collaboration between the experts in many disciplines, including mathematics, statistics, computer science, engineering, biology, and chemistry to find solutions to this challenging problem. Educational resources, like books and software, are also needed to train students to be the next generation of research leaders in this emerging research field. One of the current fields that brings the interdisciplinary experts, educational resources, and modern technologies under one roof is machine learning, which is a subfield of artificial intelligence.
Many models and algorithms for standard classification problems are available in the machine learning literature. However, a few of them are suitable for big data classification. Big data classification is dependent not only on the mathematical and software techniques but also on the computer technologies that help store, retrieve, and process the data with efficient scalability, accessibility, and computability features. One such recent technology is the distributed file system. A particular system that has become popular and provides these features is the Hadoop distributed file system, which uses the modern techniques called MapReduce programming model (or a framework) with Mapper and Reducer functions that adopt the concept called the (key, value) pairs. The machine learning techniques such as the decision tree (a hierarchical approach), random forest (an ensemble hierarchical approach), and deep learning (a layered approach) are highly suitable for the system that addresses big data classification problems. Therefore, the goal of this book is to present some of the machine learning models and algorithms, and discuss them with examples.
The general objective of this book is to help readers, especially students and newcomers to the field of big data and machine learning, to gain a quick understanding of the techniques and technologies; therefore, the theory, examples, and programs (MatLAB and R) presented in this book have been simplified, hardcoded, repeated, or spaced for improvements. They provide vehicles to test and understand the complicated concepts of various topics in the field. It is expected that the readers adopt these programs to experiment with the examples, and then modify or write their own programs toward advancing their knowledge for solving more complex and challenging problems.
Science of Information.
Part I Understanding Big Data.
Big Data Essentials.
Big Data Analytics.
Part II Understanding Big Data Systems.
Distributed File System.
MapReduce Programming Platform.
Part III Understanding Machine Learning.
Modeling and Algorithms.
Supervised Learning Models.
Supervised Learning Algorithms.
Support Vector Machine.
Decision Tree Learning.
Part IV Understanding Scaling-Up Machine Learning.
Random Forest Learning.
Deep Learning Models.
Chandelier Decision Tree.
Dimensionality Reduction.
  • Sign up or login using form at top of the page to download this file.
  • Sign up
Up