CRC Press, 2012. — 215 p.
This book is for people interested in feature selection research. Feature selection is an essential technique for dimensionality reduction and relevance detection. In advanced data mining software packages, such as SAS Enterpriser Miner, SPSS Modeler, Weka, Spider, Orange, and scikits.learn, feature selection procedures are indispensable components for successful data mining applications. The rapid advance of computer-based high-throughput techniques provides unparalleled opportunities for humans to expand capabilities in production, services, communications, and research. Meanwhile, immense quantities of high-dimensional data keep on accumulating, thus challenging and stimulating the development of feature selection research in two major directions. One trend is to improve and expand the existing techniques to meet new challenges, and the other is to develop brand new techniques directly targeting the arising challenges.
In this book, we introduce a novel feature selection technique, spectral feature selection, which forms a general platform for studying existing feature selection algorithms as well as developing novel algorithms for new problems arising from real-world applications. Spectral feature selection is a unified framework for supervised, unsupervised and semi-supervised feature selection. With its great generalizability, it includes many existing successful feature selection algorithms as its special cases, allowing the joint study of these algorithms to achieve better understanding and gain interesting insights. Based on spectral feature selection, families of novel feature selection algorithms can also be designed to address new challenges, such as handling feature redundancy, processing very large-scale data sets, and utilizing various types of knowledge to achieve multi-source feature selection.
With the steady and speedy development of feature selection research, we sincerely hope that this book presents a distinctive contribution to feature selection research, and inspires new developments in feature selection. We have no doubt what feature selection can impact on the processing of massive, high-dimensional data with complex structure in the near future. We are truly optimistic that in another 10 years when we look back, we will be humbled by the accreted power of feature selection, and by its indelible contributions to machine learning, data mining, and many real-world applications.
This book is written for students, researchers, instructors, scientists, and engineers who use or want to apply feature selection technique in their research or real-world applications. It can be used by practitioners in data mining, exploratory data analysis, bioinformatics, statistics, and computer sciences, and researchers, software engineers, and product managers in the information and analytics industries.
The only background required of the reader is some basic knowledge of linear algebra, probability theory, and convex optimization. A reader can acquire the essential ideas and important concepts with limited knowledge of probability and convex optimization. Prior experience with feature selection techniques is not required as a reader can find all needed material in the text. Any exposure to data mining challenges can help the reader appreciate the power and impact of feature selection in real-world applications.
Data of High Dimensionality and Challenges.
Univariate Formulations for Spectral Feature Selection.
Multivariate Formulations.
Connections to Existing Algorithms.
Large-Scale Spectral Feature Selection.
Multi-Source Spectral Feature Selection.